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Get Ready for SQL Server 


2008 SPI 


F: those of you who delay until Microsoft 
releases a service pack before you roll out a 
new version of a Microsoft product, the wait is al- 
most over. SQL Server 2008 SP1 is right around 
the corner. At the time of this writing, the SQL 
Server 2008 SP1 Community Technology Preview 
(CTP)—yes, even service packs now have CTPs— 
is available from Microsoft. 

Why wait for the service packs? The conven- 
tional wisdom is that Microsoft will use customer 
feedback to work out big problems between the re- 
lease and the first service pack. Just for the record, 
I found SQL Server 2008 to be rock solid from day 
one, but it’s always a good practice to keep up with 
the latest service pack. So what can you expect in 
SQL Server 2008 SP1? 


Changes to the Installation Process 
In the CTP for SQL Server 2008 SP1, you'll find a 
couple of important changes in the way that service 
packs are applied and used, as well as new func- 
tionality for SQL Server 2008. SP1 supports the 
ability to uninstall the service pack without need- 
ing to uninstall the base SQL Server instance. So if 
you install the service pack and something doesn't 
work the way it's supposed to, you can uninstall the 
service pack and leave your SQL Server instance in 
its original state. 


Slipstreaming 

If you're in the midst of rolling out SQL Server 
2008 instances to your organization, you should 
be aware that SP1 supports slipstreamed installs, 
which enable you to integrate the base SQL Server 
installation package with the updates from the 
service pack. Slipstreaming lets you create installa- 
tion packages for your new SQL Server 2008 that 
include the latest service packs and hot fixes. This 
ensures that your new SQL Server 2008 installa- 
tions will be up-to-date immediately after they're 
installed—without needing to first install the base 
installation package and then perform installa- 
tions for additional service packs and hot fixes. For 
more details about how you create a slipstream 
installation for SQL Server 2008, you can refer to 
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blogs.msdn.com/petersad/default.aspx. 
Michael Otey 


(motey@ sqlmag.com) is technical director 
for Windows IT Pro and SQL Server Maga- 
zine and author of Microsoft SQL Server 

2008 New Features (Osborne/McGraw-Hill). 


What’s New? 

In addition to the changes in the way service packs 
are applied, SQL Server 2008 SP1 contains soft- 
ware updates and security fixes. In terms of new 
features and enhanced functionality, it also in- 
cludes Report Builder 2.0 and the new ClickOnce 
installation, which will allow Report Builder 2.0 to 
be launched directly from Report Manager. 


Perform a Backup 

Before installing a service pack, it’s a good idea 
to perform a full system backup—including the 
system image. Even though Microsoft tests its ser- 
vice packs thoroughly, it’s better safe than sorry. If 
you're running multiple instances of SQL Server 
2008, remember that you apply service packs on 
a per instance basis. Perform a full backup for all 
production databases before installing the service 
pack. 


A Caution About CTPs 
A CTP for a service pack is just like a CTP for a 
product release—it’s informational only. You can 
use a CTP to test and evaluate new features to 
make sure that they work in your environment, and 
you can report any bugs you might find. However, 
a CTP is definitely not intended for production use. 
Don't load the SQL Server 2008 SP1 CTP on your 
production systems. That's what development vir- 
tual machines are for. Wait for the final release 
of SP1 before updating your production serv- 
ers. You can download the SQL Server 2008 
SP1 CTP from www.microsoft.com/down- 
loads/details.aspx?FamilyID=6f26fc45- 
f0ca-49cf-a6ee-840c7e8bb8af. 


How's SPI Treating You? 
If you have feedback on the SQL Server ser- 
vice pack process or if you want to share your 
experiences deploying SQL Server 2008 SPI, 
drop me a line at motey@sqlmag.com or 
letters@sqlmag.com. SOL 
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|18 Domainid 
|19 — Domainld 
|20 Domainid 
|21  Domainld 
| 22  Domainld 

Domainld 


Domainid 
Domainid 


Domainid 
Domainid 
Domainid 
Domainid 


Quickly Identify Columns 
with Mismatched Data Types 


0) rganizations typically use the same column name 
across different tables to denote the same item 
of data. For example, if you have a column containing 
domain IDs in several tables in a database, the col- 
umns’ names would be the same (e.g., DomainId) in 
all those tables. Those columns’ data types should also 
be the same because mismatched columns can cause 
problems. For example, mismatched columns can 
* Decrease SQL Server's performance. Columns with 
the same name in different tables (and views) are 
typically used to link the tables together (e.g., as 
keys and foreign keys). If the data types aren't the 


DATA TYPE % 
RequestDefinition 342 
RequestDefinitionAudit 3.42 
RiskAdjustmentSummary 3.42 
Risk Error 3.42 
Risk Total Summary 3.42 
Risk Value Summary 3.42 
IPNL1 3.42 
IPNLErmor1 342 
IRiek 1 3.42 
IRiskEror1 3.42 
MarketData 3.42 
PNLAdiustmentLeaacy 11 342 v 


( Query executed suc. FMF-S3-0357 (9.0 SP2) MARKETS\Stirkl (115) Paris 00:00:00 50rows 


Figure | 
Sample output 
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problems. Email your 
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number. We edit submis- 
sions for style, grammar, 
and length. If we print 
your submission, you'll 
get $100. 

—Karen Bemowski, 
senior editor 
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same, SQL Server has to convert one of the data 
types to make the columns comparable. This addi- 
tional work is unnecessary overhead. In some cases, 
it can also mean an otherwise useful index isn’t 
used, resulting in a further decrease in performance. 

* Interfere with data integrity. Client applications 
(e.g., stored procedures, ADO.NET clients) often 
need to define data types. An incorrectly defined 
column can potentially invalidate both the under- 
lying table and any client applications. Mismatched 
columns can also lead to client application errors. 
For example, if a column that's defined as tinyint 
should have been defined as int, a large value will 
cause an overflow error. 


At my company, I discovered that our databases 
often contained columns that had the same name 
but were defined with different data types. So, I cre- 
ated a utility to quickly identify which columns have 
mismatched data types across tables in a database. 
This utility uses the INFORMATIONAL SCHEMA 
.COLUMNS view. This view provides all the infor- 
mation needed to determine whether a column has 
mismatched data types—namely, the column's name 
and data type (COLUMN NAME and DATA 
TYPE), the name of the table in which the column 


resides (TABLE NAME), and the tables schema 
(TABLE SCHEMA). 

Heres how the utility works. The utility begins 
by first using INFORMATIONAL SCHEMA 
.COLUMNS to calculate column name density—that 
is, how often (in terms of percent) each column name 
appears across all the tables and views in a given data- 
base. The column names (COLUMN NAME) and 
column name densities ([%]) are stored in a temporary 
table named Prevalence. 

Next, the utility performs two INNER JOIN opera- 
tions. It first joins the INFORMATIONAL SCHEMA 
.COLUMNS view to itself, matching on the column 
name. The result set of the first join is then joined to the 
Prevalence table, matching again on the column name. 
Next, the utility searches the second join's results for 
data types that aren't the same to identify mismatched 
columns. The final result set is then sorted, with column 
name density being the primary sort criterion. 

Figure 1 shows sample results from running the 
utility. As you can see, the column named DomainId 
has been defined as a smallint, tinyint, varchar, and int 
in various tables. Once mismatches are identified, you 
can easily correct the data type for a given column. It 
might take a bit more effort to correct client applica- 
tions that use the column. 

The utility should prove valuable to DBAs because 
they can use it to identify and correct mismatched 
columns and thereby improve data integrity and SQL 
Server performance. It should also prove valuable to 
developers and quality assurance (QA) specialists. 
They can use it to ensure that columns in new and 
amended tables are defined consistently across different 
tables before those tables are migrated to a production 
environment. 

You can download the utility from the SQL Server 
Magazine website by going to www.sglmag.com, 
entering 100761 in the InstantDoc ID text box, and 
clicking the 100761.zip hotlink. The utility works on 
SQL Server 2005 and later. It’s currently set up to 
detect mismatched columns across tables in a single 
database and not across multiple databases. You might 
want to extend this utility to identify any mismatched 
columns across all the databases on a given server, 
perhaps leading to a basic data dictionary. You might 
also want to make other modifications, such as sorting 
the results by column name density and some other 
measure (e.g., the number of rows in the table, column 
usage) to give a column-importance metric that better 
meets your needs. SQL 

—Jan Stirk, freelance consultant, London 
InstantDoc ID 100761 
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Figure 2 


Sample results 


Need to Find an Object ButYou Don't Know 
Its Exact Name? Try this Search Engine 


ave you ever tried to look for an object in a SQL 
Server instance that has hundreds of databases 
without knowing the objects exact name and the 
database in which it resides? It can be quite time- 
consuming, especially if the databases include objects 
with similar names. To make the search quicker, I 


TypeName 

Table (user-defined) 

FOREIGN KEY constraint 
FOREIGN KEY constraint 
PRIMARY KEY constraint 
PRIMARY KEY constraint 

SQL inlined table-valued function 
SQL inlined table-valued function 
SQL inlined table-valued function 
SQL inlined table-valued function 
SQL inlined table-valued function 
SQL inlined table-valued function 
SQL inlined table-valued function 


ObjectName 

MSpublication_access 
FK_AccessDBSolLookup_AccessDBSolLookup 
FkK_AccessDBSolLookupValues_AccessDBSolLookup 
PK AccessDBSolLookup 

PK AccessDBSolLookupValues 

udf Access DaysPosts 

udt Access DayTrades 

udf Access EveryLedgerBalance 
udf_Access_EveryLedgerBalanceNetted 
udf_Access_LookupClientAccountld 
udf_Access_OpenHedgePositions 
udf_Access_OpenPositions 

Table (user-defined) 
Table (user-defined) 
UNIQUE constraint 
UNIQUE constraint 
View 


AccessDBSolLookup 

AccessDBSolLookupValues 

UG AccessDBSolLookup LookupDetail 

UG AccessDBSolLookupValues LookuplD Name Value 
viw_AccessDBSol_Cheuvreux_DayPosts 

View yvw_AccessDBSol_Cheuvreux_ELB 

vw. AccessDBSol Cheuvreux ELB Netted 

vw. AccessDBSol Cheuvreux Trading&ccountDetails 
eConstraintBrowser_SettingsAccessibilitySupport 
FUL_VIEW_ApplicationAccessQuestion 

sp DTÀ column access helper relational 

sp DTÀ column access helper xml 

sp DTÀ database access helper relational 

sp DTÀ database access helper «ml 

sp DTÀ table access helper relational 

sp DTÀ table access helper «ml 

sp salagent has server access 

PK AccessLog&ction 

PK AccessLogStatus 

usp_AccessLoglnsert 

usp_AccessLogUpdate 

AccessLogAction 


View 

View 

DEFAULT (constraint or stand-alone) 
View 

SQL stored procedure 
SQL stored procedure 
SQL stored procedure 
SQL stored procedure 
SQL stored procedure 
SQL stored procedure 
SQL stored procedure 
PRIMARY KEY constraint 
PRIMARY KEY constraint 
SQL stored procedure 
SQL stored procedure 
Table (user-defined) 
Table (user-defined) 
UNIQUE constraint 
UNIQUE constraint 

Table (user-defined) 


AccessLogStatus 

UG AccessLog&ction ActionDesc 

UG AccessLogStatus LogStatusDesc 
MSdistributor access 


created sp ObjectSearch. This stored 
procedure checks objects’ names for the 


query. As you can see, six databases contain objects 
whose name includes the string access. Besides 
specifying the databases name and the object’s 
name, the result set specifies the type of object. The 
sp ObjectSearch stored procedure handles many 
types of objects, including user-defined tables, views, 
primary key and foreign key constraints, SOL and 
CLR stored procedures, SQL and CLR Data Manip- 
ulation Language (DML) triggers, and SQL and 
CLR scalar functions. As Listing 1 shows, the stored 
procedure uses a simple CASE function to identify 
each object’s type. (For information about both 
simple and searched CASE functions, see “T-SQL 
101: The CASE Function,” April 2009, InstantDoc 
ID 100152.) 

The sp_ObjectSearch stored procedure works on 
SQL Server 2005 and later. For backward compat- 
ibility, I created sp_ObjectSearch_2K. You can down- 
load both stored procedures from the SQL Server 
Magazine website by going to www.sqlmag.com, 
entering 101693 in the InstantDoc ID text box, and 
clicking the 101693.zip hotlink. SQL 

—Shaunt Khaldtiance, senior DBA, 
CityIndex 
InstantDoc ID 101693 


ORE on the WEB 


Download the code at 
InstantDoc IDs 100761 
and 101693. 


LISTING |: Case Function that Identifies the 


Type of Object 


CASE type 
WHEN ''AF'' THEN ''Aggregate function (CLR)'' 
WHEN ''C'' THEN ''CHECK constraint'' 


string you specify. It searches through all 
the objects in each database within the 
current SQL instance. 


WHEN ''D'' THEN ''DEFAULT (constraint or stand-alone)'' 
To call the sp. ObjectSearch stored WHEN ''F'' THEN ''FOREIGN KEY constraint'' 
E WHEN ''PK'' THEN ''PRIMARY KEY constraint'' 
procedure, you use the syntax WHEN ''P'' THEN ''SQL stored procedure'' 
WHEN ''PC'' THEN ''Assembly (CLR) stored procedure'' 
WHEN ''FN'' THEN ''SQL scalar function'' 
sp ObjectSearch 'search string' WHEN ''FS'' THEN ''Assembly (CLR) scalar function'' 
WHEN ''FT'' THEN ''Assembly (CLR) table-valued function'' 
WHEN ''R'' THEN ''Rule (old-style, stand-alone)'' 
where search string is the target string. WHEN ''RF'' THEN ''Replication-filter-procedure'' 
WHEN ''SN'' THEN ''Synonym'' 
For example, suppose you need to find WHEN ''SQ'' THEN ''Service queue'' 
; : WHEN ''TA'' THEN ''Assembly (CLR) DML trigger'' 
an object whose name includes the word WHEN ''TR'' THEN ''SQL DML trigger'' 
access. You'd use code such as WHEN ''IF'' THEN ''SQL inlined table-valued function'' 
WHEN ''TF'' THEN ''SQL table-valued-function' ' 
WHEN ''U'' THEN ''Table (user-defined)'" 
EXEC master..sp ObjectSearch WHEN ''UQ'* THEN "'UNIQUE constraint'' 
: ; WHEN ''V'' THEN ''View'' 
access WHEN ''X'' THEN ''Extended stored procedure'' 
WHEN ''IT'' THEN ''Internal table'' 
END AS Type 


Figure 2 shows sample results from this 
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icrosoft built the Language-Integrated 
VI Query (LINQ) to SQL component in 
.NET Framework on top of LINQ to 
achieve one main purpose: Extend the general- 
purpose query syntax of LINQ to the world of 
SQL Server databases. As you may know, accessing 
in-memory data collections, databases, and XML 
documents today requires a familiarity with a variety 
of languages and syntax—one for each technology. 
With LINQ, you can use the same syntax to query 
different types of data. 
The LINQ query engine is extensible enough 
to support any collection of objects that exposes a 
made-to-measure interface—the IQueryable inter- 
face. This means that any team of developers could, 
in theory, expose to LINQ their own data and que- 
ry it through the common syntax—which is what 
Microsoft did with the LINQ to SQL and LINQ to 
XML interfaces. LINQ to SQL, in particular, wraps 
the content of a SQL Server database and makes it 
queryable through the LINQ syntax. We'll explore 
LINQ to SQL in depth here, examining the T-SQL 
basis for LINQ to SQL queries and walking through 
examples of using LINQ to SQL to code data-access 
functionality in .NET applications. 


LINQ to SQL and T-SQL 

LINQ to SQL doesn't work with databases other 
than SQL Server. So, unlike ADO.NET or industry- 
standard object-relational mapping tools, you can't 


JOINING AND GROUPING 


D 
use LINQ to SQL to work with, say, Oracle data- 
bases. (You can instead load content stored in Oracle 
databases through DataSets and query over that us- 
ing the LINQ to DataSet model.) LINQ to SQL still 
uses T-SQL under the hood to run its queries. LINQ 
to SQL doesn’t attempt to push an alternate route or 
replace T-SQL; it simply offers a higher-level set of 
query tools for developers to leverage. LINQ to SQL 
is essentially a more modern tool to generate T-SQL 
dynamically based on the current configuration of 
some business-specific objects. T-SQL operates on a 
set of tables, whereas LINQ to SQL operates on an 
object model created after the original set of tables. 

LINQ to SQL opens up SQL Server querying to 
the larger universe of .NET developers. However, 
with LINQ to SQL as well as with raw T-SQL, only 
T-SQL and SQL Server database experts can fine- 
tune the queries and remove any bottlenecks and per- 
formance hits. 


Preparing for LINQ to SQL 
LINQ to SQL works by exposing an object-based 
query language to developers and producing T-SQL 
statements in response to their input. With LINQ to 
SQL, you don’t explicitly use a connection string, nor 
do you open or close a connection explicitly. All you 
do is interact with a central console called the data 
context. 

The data context is a class that’s ordinarily created 
by the Microsoft Visual Studio 2008 wizard when you 
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Learn ways to 


code data-access functionality 


choose to add a new LINQ to SQL class 
item to the current project, as Figure 1 
shows. Adding a LINQ to SQL class to a 
project displays the Object Relational De- 
signer, through which the developer selects 
tables, stored procedures, and functions to 
add to the LINQ to SQL data context. 
The completed data context class incor- 
porates references to collections that rep- 
resent all selected tables. The data context 
source file also includes the definition of a 
class that represents an individual row on 
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each selected table. In other words, if you 
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Loe 
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select the Customers and Orders tables from 
the Northwind database, you'll have in the Name: 


‘LING to SQL classes mapped to relational objects. 


Northwind.dbmil 


data context class the properties that the C# 
code sample in Listing 1 shows. 

You'll work against these collections 
using the LINQ syntax. Any query you perform 
against the collections will then be translated to 
T-SQL statements and executed against the database. 
The data context class takes care of all this for you. 

The data context—usually a class derived from 
the base class DataContext—has a Log property, as 
the following example shows. This code registers a 
stream to capture all T-SQL statements actually gen- 
erated for each LINQ query to the database. 


NorthwindDataContext nwind = 
new NorthwindDataContext() ; 
nwind.Log = Console.Out; 


(Some code wraps to multiple lines in the print ver- 
sion of this article.) Through the Log property, you 
can track what really happens under the hood—that 
is, snoop on generated T-SQL statements—and de- 
cide to intervene with changes if you think the T-SQL 
code could be improved. 


Running a Query in LINQ 

The LINQ query language works on the LINQ to 
SQL object model in much the same way as T-SQL 
works on table rows. To express a simple query that 
selects customers from a given country and returns 
them ordered by name, in LINQ you use this query: 


var data = from c in nwind.Customers 
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where c.Country -- "Spain" 
orderby c.CompanyName 
select c; 


As you can see, in the query there's no explicit refer- 
ence to T-SQL; furthermore, the only point of con- 
tact between LINQ and the outside world is the data 
collection you work with. That collection is respon- 
sible for taking the input data (filter, order) and select- 
ing the results. Being a LINQ to SQL collection, it 
will do so by running a dynamically generated T-SQL 
query. As a result, LINQ to SQL offers a platform for 
you to express entities in the domain space as objects. 
You can then query entities and update them, using 
an object-oriented syntax. 

The code fragment contains context keywords 
added to C£: from, in, where, orderby, and select. 


LISTING |: Data Context Class Properties 


Figure | 


Adding a LINQ to SOL 
data context class to a 
.NET project 
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LISTING 2: ASP.NET Code EN 
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These keywords map to the underlying LINQ query 
engine. Their role is analogous to the role played by 
similar T-SQL keywords. You might wonder why 
select —the projection operator—appears at the end 
of the statement and from is instead at the begin- 
ning of it. The only reason for having this is the need 
to enable Visual Studio 2008 IntelliSense on LINQ 
expressions. Because the projection is expressed as a 
list of properties on any of the objects referenced in 
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keyword isn’t a sign of weak typing; no weak types 
are ever used. It is, instead, a sign that some dynami- 
cally generated code will run and the actual type of 
results may be anonymous. 

Data doesn’t flow in the assigned variable right 
away as the LINQ to SQL statement executes. The 
execution of the previous statement doesn’t fill the 
variable with any data. Instead, it simply prepares 
the ground for the execution of the T-SQL statement. 
The T-SQL statement is generated and configured 
but isn’t executed until the main code stream needs 
to process resulting data. In other words, as soon 
as the code consumes the content of the previously 
mentioned data variable, the T-SQL statement is sent 
to the database. For example, if you bind the content 
of the data variable to a control, triggering the data- 
binding process will actually trigger the following 
T-SQL statement: 


GridViewl.DataSource = data; 
GridViewl.DataBindO; 


This lazy loading feature can have some undesired 
effects if you don't master it properly. For example, 
consider the code in Listing 2, a code fragment 
from an ASP.NET page. You might want to 
run SQL Server Profiler to trace its execution. 
First, the code attempts to retrieve all orders 
shipped through the same shipping company. 
When the execution flow reaches the for each 
statement, the T-SQL statement highlighted in 
Figure 2 runs. The following is a simplified ver- 
sion of the statement: 


SELECT * FROM orders WHERE shipvia-3 


You may think that after running the first 
query to grab all matching records in the Or- 
ders table, you're fine and can work safely and 
effectively with any selected object. Well, not 
exactly. As Figure 2 shows, many additional 
queries actually hit the SQL Server database. 
Why is that? Let's expand on one of them, the 
exec sp executesql statement, in Listing 3. 
Note that all the additional queries are similar 
to the preceding query. All they do is retrieve 


Figure 2 
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the query, IntelliSense wouldn't, in fact, be possible 
if the projection were declared before the actual 
objects. 

Note that LINQ to SQL statements aren't run im- 
mediately. This is why you need to use a new C£ key- 
word—the var keyword—in the statement. The var 
keyword (the Dim keyword in Visual Basic) indicates 
that the developer might not know the data's type at 
writing time. By using var, the developer delegates the 
compiler to resolve the type at compile time. The var 


information about the customer who placed 
the order being processed in the foreach loop 
of Listing 2. 

As a result, you run one query to retrieve all the 
orders, plus one additional query for each selected 
order to access customer information. This is the de- 
fault behavior, which you might not have expected. 

By changing the parameters for lazy loading and 
specifying your fetch plan, you can force LINQ to SOL 
to run a totally different query and, more importantly, 
to get all order and customer data in a single shot. Here's 
how you need to modify your code to do so: 
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DataLoadOptions options = new 
DataLoadOptions(); 
options.LoadWith«Order»(o => 
o.Customer) ; 
dataContext.LoadOptions = options; 


By setting the LoadOptions property, you spec- 
ify a static fetch plan that tells the runtime to 
load all customer information at the same time 
order information is loaded. Based on this, the 
LINQ to SQL engine can optimize the query 
and retrieve all data in a single statement. List- 
ing 4 shows the SQL Server Profiler detail in- 
formation for the exec sp_executesq] query after 
the code modification. The query now includes 
a LEFT OUTER JOIN that loads orders and 
related customers in one step. 

What does this mean to you? LINQ to SQL 
queries have their own logic, and the underly- 
ing behavior might not be exactly what you 
expect. LINQ to SQL produces T-SQL state- 
ments, but it's a totally different engine—so 
don't make assumptions about LINQ to SQL 
behavior before you've verified how it works in 
practice. LINQ to SQL is a brand-new API for 
operating on a SQL Server database; it's not 
just another way of writing T-SQL code. Let's 
see now how to perform more advanced query- 
ing that joins and groups operations on data. 


Joining Data 

It goes without saying that many SQL Server 
queries are easier to formulate with words than 
with T-SQL. With experience, you'll probably 
become more adept at expressing queries direct- 
ly in T-SQL. However, for the vast majority of 
developers working on data-access components, 


LISTING 3: Query Used in exec sp executesql 


exec sp executesql 
N'SELECT [t0].[CustomerID], [t@].[CompanyName], [t0].[ContactName], 
[tø] . [ContactTitle], [t@].[Address], [t@].[City], [t@].[Region], 
[t@].[PostalCode], [t@].[Country], [t@].[Phone], [t@]. [Fax] 
FROM [dbo].[Customers] AS [tø] 
WHERE [t0].[CustomerID] = @pQ', 
N'@p@ nvarchar(5)', 
@p@=N'PICCO' 


LISTING 4: Exec sp executesql Query After Modifying 
the LoadOptions Property 


exec sp_executesql 
N'SELECT [t@].[OrderID], [t0].[CustomerID], [t0].[EmployeeID], 
[t@].[OrderDate], [t@].[RequiredDate], [t9].[ShippedDate], 
[tø]. [ShipVia], [t@].[Freight], [t0].[ShipName], 
[t9]. [ShipAddress], [t@].[ShipCity], [t@].[ShipRegion], 
[t@].[ShipPostalCode], [t@].[ShipCountry], 
[t2]. [CustomerID] AS [CustomerID2], [t2].[CompanyName], 
[t2]. [ContactName], [t2].[ContactTitle], [t2].[Address], 
[t2].[City], [t2].[Region], [t2].[PostalCode], 
[t2].[Country], [t2].[Phone], [t2].[Fax] 
FROM [dbo].[Orders] AS [tøð] 
LEFT OUTER JOIN C 
SELECT [t1].[CustomerID], [t1]. [CompanyName], 
[t1] . [ContactName], [t1].[ContactTitle], [t1].[Address], 
[t1].[City], [t1].[Region], [ti1].[PostalCode], 
[ti].[Country], [t1].[Phone], [t1]. [Fax] 
FROM [dbo].[Customers] AS [t1] 
) AS [t2] ON [t2].[CustomerID] = [t%]. [CustomerID] 
WHERE [t@].[ShipVia] = QGp9', 
N'@p@ int', 
@pB=3 


LISTING 5: Sample LINQ Query 


var data = from c in nwind.Customers 

join o in 
(from orderInJan97 in nwind.Orders 
where orderInJan97.0rderDate.Value.Year == 1997 && 

orderInJan97.0rderDate.Value.Month == 

select orderInJan97) 
on c.CustomerID equals o.CustomerID into g 

select new ( c.CompanyName, OrderCount = g.Count() }; 


use select to return the entire customer row. 


finding a T-SQL counterpart for the following query 
could be difficult. Let's consider a query that returns 
for each customer the total number of orders placed 
in given timeframe. The query clearly involves a join 
between customers and orders. Listing 5 shows how 
youd write it using a LINQ to SQL object model and 
the LINQ query language. 

Here's a breakdown of the statement: You per- 
form a JOIN between the Customers table and the 
subset of the Orders table containing only the orders 
placed during January 1997. The JOIN is executed on 
the CustomerID column, and the results of the JOIN 
for the single customer record are saved into a group. 
Next, you simply select the columns you need from 
the customer and compute a Count operation on the 
related group of orders. 

In LINQ, you can use the select keyword to return 
an entire object or just a dynamically specified list of 
fields. In the following simple statement, in fact, you 
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var data - from c in nwind.Customers 
select C; 


If you want to restrict the projection to just a few 
fields, you use the select new variation, as follows: 


var data = from c in nwind.Customers 
select new {c.CustomerID, 
c.CompanyName} ; 


When you have more complex queries that involve 
JOIN and GROUPBY operations, you can project on 
joined fields as well and apply functions (Max, Min, 
Count, Avg) to groups of data. The syntax of the se- 
lect new keyword is flexible enough to accommodate 
an explicit property naming. In the earlier JOIN ex- 
ample, you explicitly set the name of the column that 
reports a customer's total orders to OrderCount. 
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LISTING 7: System-Generated Name for 
Returned Data Type 
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LISTING 8: Accessing Results of a 
Group-by Operation 


What about the type of the projected data when 
you use a select and a select new keyword? In the for- 
mer case, the returned type is a list of the specified 
type. For example, the following code will return a 
List<Customer> type. 


List<Customer> data = from c 
in nwind.Customers 
select c; 


When a select new is used, instead, an anonymous 
type is returned—so the var keyword is an absolute 
necessity to delegate to the compiler the task of re- 
solving the type dependency as late as possible. Note, 
though, that strong typing never yields to weak typ- 
ing in LINQ to SQL; using the var keyword changes 
instructs the compiler to resolve the type of the vari- 
able during compilation. For purely illustrative pur- 
poses, let’s see what the ToString method on the data 
variable returns. 


System.Collections.Generic.List'l 
[«»f. AnonymousType2 2 
[System. String, System. Int32]] 


The variable contains a list of values of an anony- 
mous, unnamed type comprising a collection of 
string and integer values. 


Grouping Data 
To group data in a LINQ to SQL expression, you use 
the group keyword. The companion by keyword, in 


conjunction with the new keyword, lets you shape 
the records that go in each group. The variable that 
receives grouped data will actually get a list of data 
containers each implementing the IGrouping<TKey, 
TElement> interface. Let’s focus on the expression in 
Listing 6 first. 

The query organizes all customers in groups ac- 
cording to the city, region, and length of the city 
name. Each group will contain each customer’s sole 
contact name, and empty groups are discarded. List- 
ing 7 shows the system-generated name for the re- 
turned data type. (Again, this is for illustrative pur- 
poses only; as a developer, you never have to work 
with the system-generated name explicitly.) 

Each element in the list you store in the data vari- 
able is an enumerable type that additionally has a 
key. The key represents the attribute that’s common 
to each value in the group. In the preceding example, 
the key is represented by city, region, and a Boolean 
value indicating whether the length of the city name 
is greater than 5. 

The results obtained from a group-by operation 
can hardly be used as-is to populate some UI control 
in an ASP.NET or Windows form. It’s therefore likely 
that you'll build a UI block by processing data in the 
group programmatically. The example in Listing 8 
shows how you'd programmatically access records in 
the groups. 

As Listing 8 shows, you use the Key property 
on each element in the returned collection to access 
the information you grouped by. The Key property 
is an anonymous type defined by the properties you 
specified through the by new keyword in the query. 
Next, for each group you can access specific record 
information in the format that you've indicated via 
the group keyword. With reference to the preceding 
example, record information in each group is only a 
customer's contact name. 


An Alternative Data-Access 
Method 
LINQ to SQL offers an alternative model to 
plan and develop the data-access layer of .NET 
applications. Through LINQ to SQL, you can re- 
alize a brand-new data-access layer and use auto- 
generated types to exchange data with the business 
and presentation layers. Any query that you express 
through the LINQ syntax is translated into a T-SQL 
statement and run. The execution of this statement 
is transparent to developers, and so is the dynami- 
cally generated T-SQL. LINQ to SQL is much more 
than just a high-level version of T-SQL, but keep in 
mind that it might not always be able to achieve the 
same results as you can achieve using raw T-SQL. 
SOU 
InstantDoc ID 101721 
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Running Aggregates 


Use these solutions for small partitions 


unning aggregates are calculations that 

are commonly used for data analysis. This 

article is the first of several articles covering 
various techniques to calculate running aggregates with 
a focus on performance. TIl discuss how to analyze the 
complexity of a solution, as well as explain how to pre- 
dict performance changes based on changing variables 
in the data or the solution. This article focuses on set- 
based solutions using subqueries and joins. 


Running Aggregates 
A running aggregate is an aggregate of a measure that 
keeps accumulating over an ordered sequence, possibly 
within partitions. Consider a table called Sales that 
contains employees’ daily sales quantities and values. 
Run the code in Listing 1 to create the Sales table in 
the tempdb database for demo purposes. (I provide 
the code to populate this table with sample data later 
in the article.) 

An example of a running aggregate is a running 
sum of the quantity (or value) for each employee and 
day. That is, for each employee and day, calculate the 
sum of quantity from the beginning of the employee’s 
activity until the current day. The elements involved 
in the calculation of a running aggregate include the 
measure you are aggregating (qty in our example), 
the ordering attribute (dt in our case), and if relevant, 
the partitioning attribute (empid). This 
is the most basic and classic type of 


LISTING I: 


solution before you can evaluate indexing strategies for 
it. With running aggregates, though, most solutions 
would benefit from the same indexing strategy. The 
ideal index is a covering one created on the partitioning 
columns followed by the ordering columns as the index 
keys, and the measures as included nonkey columns. 
One way to achieve such an indexing strategy is to 
create a clustered index on the partitioning columns 
followed by the ordering columns as the clustered 
index keys. Since the leaf rows of the clustered index 
will contain all other columns from the table, you get 
coverage of the measures as well. Our Sales table has 
such an index, which was implicitly created because of 
the primary key defined on (empid, dt). If you need 
your clustered index to be defined differently for your 
own reasons, another way to implement this 
strategy is to create a nonclustered index on 
the partitioning columns and sort columns 
as keys, and on the attributes holding the 
measures as included nonkey columns. The 
index definition in our case would look like this (don’t 
run this command, because we already have a clustered 
index that supports the desired strategy): 
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CREATE UNIQUE INDEX idx_runagg ON dbo 
.Sales(empid, dt) INCLUDE(qty, val); 


DDL Statement to Create Sales 


. s ., Table 
a running aggregate. Variations exist 
that have different boundary points to Tee ben P 


determine the window of rows that the 
aggregate operates on. However, I'll 
stick to the classic form since my focus 
is performance and in this sense the 
variations should be similar for the most 
part. 

Regarding indexing guidelines, 
typically you need to come up with a 
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IF OBJECT ID('dbo.Sales', 'U') IS NOT NULL DROP TABLE dbo.Sales; 
CREATE TABLE dbo.Sales 


empid INT NOT NULL, -- partitioning column 
dt DATETIME NOT NULL, -- ordering column 
qty INT NOT NULL DEFAULT (1), -- measure 1 
val MONEY NOT NULL DEFAULT (1.09), -- measure 2 
CONSTRAINT PK Sales PRIMARY KEY(empid, dt) 

5 

GO 
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LISTING 2: 
Function 


IF OBJECT ID('dbo.GetNums', 
DROP FUNCTION dbo.GetNums; 


DDL Statement to Create GetNums 


'IF') IS NOT NULL 


GO 

CREATE FUNCTION dbo.GetNums(@n AS BIGINT) RETURNS TABLE 

AS 

RETURN 
WITH 
Lg ASCSELECT 1 AS c UNION ALL SELECT 1), 
L1 AS(SELECT 1 AS c FROM LØ AS A CROSS JOIN LØ AS B), 
L2 AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B), 
L3 AS(SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B), 
L4 AS(SELECT 1 AS c FROM L3 AS A CROSS JOIN L3 AS B), 
L5 AS(SELECT 1 AS c FROM L4 AS A CROSS JOIN L4 AS B), 


Nums AS(SELECT ROW NUMBER() OVER(ORDER BY (SELECT 9)) AS n FROM L5) 
SELECT n FROM Nums WHERE n «- Qn; 
GO 


LISTING 3: Code to Populate Sales Table with 
Sample Data 


DECLARE 
Gnum partitions AS INT, 
Grows per partition AS INT, 
Qstart dt AS DATETIME; 


SET Gnum partitions - 10000; 
SET Grows per partition = 10; 
SET Gstart dt = '20090101'; 


TRUNCATE TABLE dbo.Sales; 


INSERT INTO dbo.Sales WITH (TABLOCK) (empid, dt) 
SELECT NP.n AS empid, DATEADD(day, RPP.n - 1, Qstart dt) AS dt 
FROM dbo.GetNums(Gnum partitions) AS NP 
CROSS JOIN dbo.GetNums(Qrows per partition) AS RPP; 


LISTING 4: Set-Based 
Solution Using Subqueries 


SELECT empid, dt, qty, 
(SELECT SUM(S2.qty) 
FROM dbo.Sales AS S2 
WHERE S2.empid = Sl.empid 
AND S2.dt <= S1.dt) AS sumqty 
FROM dbo.Sales AS S1; 


Next, I provide instruc- 
tions to populate the 
Sales table with sample 
data before presenting 
solutions. First, run the 
code in Listing 2 to create 
a helper function called 
GetNums. GetNums 


accepts a number as input and returns a table result 
with a sequence of integers in the range 1 and the 
input number. Run the code in Listing 3 to populate 
the table with sample data. Notice in Listing 3 that 
you can set the number of partitions and the partition 
size. Currently, Listing 3 has 10,000 partitions with a 


partition size of 10. Later when I want to demonstrate 
the effect of changes in the number of partitions or 
partition size, I'll rerun the code in Listing 3 with other 
values as needed. 


Set-Based Solution Using 
Subqueries 

You might wonder why I'm covering both subqueries 
and joins in this article. The reason for this is that 
some people simply prefer to use subqueries, whereas 
others prefer to use joins. In addition, although some 
performance aspects are similar in both approaches, 
certain aspects of performance are different between 
the two approaches. 

Let's start with the subquery approach that Listing 
4 shows. The outer query is against the Sales table, 
aliased as S1. From the outer instance you return 
the employee ID, quantity, and date. A subquery is 
in charge of calculating the running aggregate. The 
subquery is against a second instance of the Sales table, 
aliased as S2. The subquery filters the rows from S2 
where the employee ID is the same as the one in S1, 
and the date is smaller than or equal to the one in S1. 
In the SELECT list, the subquery applies the SUM 
function to the qty attribute to aggregate all quantities 
from the qualifying rows. To evaluate the performance 
aspects of the solution, examine the execution plan for 
the solution query that Figure 1 shows. 

As you can see, the plan shows that the query was 
processed with a Nested Loops join. The outer input 
of the join is the result of a clustered index scan rep- 
resenting the outer instance of the Sales table that was 
aliased as S1. For each of those rows, the loop applies 
activity against the inner input of the join, which is the 
instance of the Sales table aliased as S2. This activity 
involves a clustered index seek followed by a partial 
scan to fetch all rows with the same empid as in the 
outer row, and a dt that is smaller than or equal to the 
one in the outer row. Those rows returned from the 
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Actual Number of Rows 550000 
Estimated Number of Rows 3 
Estimated Row Size 11B 
Estimated Data Size 33B 
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Execution plan for query in Listing 4 
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clustered index seek and partial scan are then aggre- 
gated with a Stream Aggregate operator. 

Now that you're familiar with the plan, let's 
evaluate its complexity. Although we could consider 
various aspects of the plan’s complexity, let’s keep 
things simple by focusing on the number of rows that 
are processed by the plan and that need to be aggre- 


SUBQUERIES AND JOINS 


LISTING 5: Set-Based 
Solution Using Joins 


SELECT Sl.empid, S1.dt, Sl.qty, 
SUM(S2.qty) AS sumqty 
FROM dbo.Sales AS S1 
JOIN dbo.Sales AS S2 
ON S2.empid = Sl.empid 
AND S2.dt <= S1.dt 
GROUP BY Sl.empid, Sl.dt, Sl.qty; 


28 
Xx X 
EI Parallelism Hash Match — — — Parallelism - 
(Repartition Streans) (Inner Join) (Repartition Streams) 
Cost: 22 % Cost: 21 * Cost: 3 * 
Actual Number of Rows 550000 
Estimated Number of Rows 550000 3€ 
Estimated Row Size 27B Parallelism 
Estimated Data Size 14 MB (Repartition Stream) 
: Cost: 3 * 
m 2,3 
=. : 
SELECT Parallelism Hash Match 
Cost: 0 * (Gather Streans) (Aggregate) 
LUC M Cost: 2 % Cost: 38 % 


by 


) Clustered Index Scan (Clustered) 
[Sales]. [PK Sales] [S1] 


Cost: 5 % 


Lif 


= 
|Clustered Index Scan (Clustered) 
[sales]. [PK Sales] [S2] 


Cost: 5 % 


Figure 2 


Execution plan for query in Listing 5 


gated. For this purpose, we need to take into consider- 
ation the number of partitions involved (call it p), and 
the average partition size (call it r). As you can see in the 
plan, the bulk of the cost is associated with the activity 
that is run in a loop scanning the relevant range of rows 
in the clustered index for each of the outer rows. This 
loop runs as many times as the number of rows in the 
table, which is roughly pr. For each of those rows, the 
number of rows scanned, as mentioned earlier, is as 
many as the number of rows with the same empid as 
in the outer row, and a dt that is smaller than or equal 
to the one in the outer row. For the first dt value for a 
given employee, there will be 1 match, for the second 
2, for the third 3, and for the rth r. This means that per 
each employee, the total number of rows processed is 
(1 € 2 * 3 ... +r). That's an arithmetic sequence, 
whose sum is (r + r°)/2. Therefore, the total number of 
rows processed is p(r + 1’)/2. As an example, currently 
there are 10,000 partitions in the table, each with a size 
of 10. Therefore the total number of rows processed 
and aggregated is 10,000 x (10 + 1052 = 550,000. 
Notice that in the actual execution plan in Figure 1 
this is exactly the number of rows that appear in the 
tooltip box reporting how many rows were returned 
by the Clustered Index Seek operator (total for all 
iterations). Later we'll see what the implications of this 
complexity are on performance in terms of changes in 
the variables involved. 


Set-Based Solution Using Joins 


Listing 5 illustrates the approach using a join. As 
you can see, the logic is similar to that applied by the 
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subquery-based approach. However, in the join 
approach, S1 and S2 are the left and right sides of the 
join, respectively. The same predicate that was used 
in the subquery’s filter (S2.empid = Sl.empid AND 
S2.dt <= S1.dt) now appears as the join condition. 
This means that each row from S1 will find as many 
matching rows in S2 as the number of rows that have 
the same empid as in S1, and a dt value that is smaller 
than or equal to the one in S1. The row in S1 will 
appear in the result of the join as many times as the 
number of matches that it has in S2. Therefore, in order 
to calculate the running aggregate, you have to collapse 
all copies of S1 by grouping the data by all elements 
from S1, and applying the aggregate to the measure 
from the S2 side of the join. Figure 2 shows the execu- 
tion plan for this solution. 

There are several differences to note in this plan 
compared with the plan for the subquery approach. In 
this case the optimizer chose a hash join, as opposed to 
the nested loops join used in the previous case. Don't 
let the fact that the plan shows only two scans of the 
data confuse you. One scan is used to build the hash 
table and another is used to probe the hash table, but 
the number of rows produced by the join that are later 
aggregated is the same as the number of rows processed 
by the nested loop join in the plan for the subquery 
solution. You can see this clearly in the tooltip box 
showing the number of rows streaming out of the 
Hash Match (Inner Join) operator. The complexity 
of the join-based solution in terms of the number of 
rows processed by the join operator and aggregated is 
p(r + r?)2 as well. 
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LISTING 6: Set-Based 
Solution Using Subqueries, 
Multiple Aggregates 


SELECT empid, qty, val, 
(SELECT SUM(S2.qty) 
FROM dbo.Sales AS S2 
WHERE S2.empid = S1.empid 
AND S2.dt <= S1.dt) AS sumqty, 
(SELECT AVG(S2.qty) 
FROM dbo.Sales AS S2 
WHERE S2.empid = Sl.empid 
AND S2.dt <= S1.dt) AS avgqty, 
(SELECT SUM(S2.val) 
FROM dbo.Sales AS S2 
WHERE S2.empid = Sl.empid 
AND S2.dt <= S1.dt) AS sumval, 
(SELECT AVG(S2.val) 
FROM dbo.Sales AS S2 
WHERE S2.empid = Sl.empid 
AND S2.dt <= S1.dt) AS avgval 
FROM dbo.Sales AS S1; 


SUBQUERIES AND JOINS 


Both plans in Figure 1 and Figure 2 
were captured for the given sample data 
with 10,000 partitions and a partition 
size of 10 rows. It's interesting to note 
that when playing with different num- 
bers of partitions and different partition 
sizes, the optimizer in some of the cases 
used the nested loops algorithm for the 
join-based solution as well. This was 
especially the case when testing larger 
partition sizes. 

Another difference to note is the 
point where the aggregate is calculated. 
With the subquery approach, the aggre- 
gate is calculated in the inner branch of 


complexity of the subquery-based solution in terms 
of the number of rows that will be processed and 
aggregated can be expressed as: ap(r + r°)/2. With 
the join approach, on the other hand, any number 
of aggregates can be applied to the result of the join 
based on the same scan of the data. Therefore, the 
complexity of the join-based solution is not affected 
by the number of aggregates and remains p(r + 17)/2 
even when multiple aggregates are involved. I’m not 
saying that calculating more aggregates doesn’t take 
more effort, but the relevant window of rows has to 
be scanned only once. 

Let’s make things more tangible by running the 
solutions and measuring their performance. Note that 
I ran all solutions with results discarded in order not 
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the join, per each outer row separately. With the join 
approach, the aggregate is calculated after the join, 
based on the result of the join. You can also observe 
that parallelism is handled differently in both cases. 


Execution plan for 
query in Listing 6 


Effect of Number of Aggregates 
When you evaluate a solution's performance, you need 
to be able to predict the effect of changes to certain 
variables. In this section I describe the effect of changes 
to the number of aggregates requested; I describe 
changes to other variables in subsequent sections of 

the article. 
Both the solutions I showed you earlier requested 
only one running aggregate. Can you predict the effect 
LISTING 7: Set-Based of changing the number of requested 
Solution Using Joins, aggregates? To do so, you need to be 
Multiple Aggregates aware of a certain limitation of the 
optimizer when it comes to optimizing 
subqueries. When optimizing a query 
with multiple subqueries, even when 
all subqueries need to access the same 
window of rows, unfortunately the 
optimizer uses a separate scan of the 
data for each subquery. This means 
that if you have a aggregates, the 


SELECT Sl.empid, S1.qty, S1.val, 
SUM(S2.qty) AS sumqty, 
AVG(S2.qty) AS avgqty, 
SUM(S2.val) AS sumqty, 
AVG(S2.val) AS avgval 

FROM dbo.Sales AS S1 
JOIN dbo.Sales AS S2 

ON S2.empid = Sl.empid 
AND S2.dt <= S1.dt 

GROUP BY Sl.empid, S1.dt, Sl.qty, 

S1.val; 
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to include the time it takes to print the result in the 
output. The code in Listing 6 shows a query imple- 
menting the subquery-based approach, asking for four 
aggregates. Figure 3 shows the right-hand part of the 
execution plan for this query. Notice that in addition 
to the initial scan of the data representing the outer 
instance of Sales called S1, the relevant data is scanned 
and aggregated four times for the four aggregates. The 
query shown earlier in Listing 4 with the single aggre- 
gate ran for 733 milliseconds on my system, whereas 
this one ran for 2,753 milliseconds—almost four times 
longer. 

The code in Listing 7 shows a query implementing 
the join-based approach, asking for four aggregates. 
The execution plan for this query is very similar to the 
one shown earlier in Figure 2 for the single aggregate 
request. You'll find that the same amount of data 
is scanned and processed. Naturally, the operators 
that calculate the aggregates will have a bit more 
work involved, but the amount of data scanned and 
processed doesn't change. The query shown earlier 
in Listing 5 with the single aggregate ran for 593 
milliseconds on my system, whereas this one ran 
for 846 milliseconds. As you can see, with the join 
approach the penalty for requesting more aggregates 
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isn't very high. Therefore, in this sense, the join-based 
approach is probably better if you need multiple 
aggregates. 


Effect of Number of Partitions 
From the algorithmic complexity of the solutions, 
you should be able to predict the effect of changing 
the number of partitions (while keeping the parti- 
tion size constant, of course). The complexity of 
both the subquery-based and join-based solutions is 
p(r + 1°)/2, only the subquery solution is also affected 
by the number of requested aggregates. If the number 
of partitions grows, the expectation is for the perfor- 
mance to degrade linearly. For example, if the number 
of partitions increases by a factor of f; you expect the 
run time to increase by a factor of f since the total 
number of rows processed becomes pf(r + 1°)/2. Now 
it's up to a benchmark to see whether practice agrees 
with theory. 

I ran a benchmark with number of partitions 
varying between 10,000 and 100,000, with a constant 
partition size of 10. Figure 4 shows the benchmark 
results, which confirm our expectations. Linear 
performance degradation is typically a good thing, 
of course. It’s also interesting to note that with 
such small partitions, the performance of the two 
approaches is quite similar, with the join approach 
being slightly faster. 


Effect of Partition Size 

Unlike the linear performance degradation caused by 
an increase in the number of partitions, the perfor- 
mance degradation resulting from an increase in the 
partition size is quadratic because of the r? part of the 
expression describing the number of rows processed. 
If the partition size grows by a factor of f, the number 
of rows processed grows by a factor of close to f£? 
because the expression describing the number of rows 
processed becomes p(rf + (rf)’)/2. Therefore, if the par- 
tition size increases by a factor of f; you should expect 
the run time to increase by a factor of f°. 

To test this theory, I ran a benchmark with a 
partition size varying between 100 and 1,000, with 
a constant number of partitions of 1,000. Figure 5 
shows the benchmark results, which again confirm our 
expectations. Quadratic performance degradation is of 
course typically a bad thing. Beyond very small parti- 
tion sizes, the performance becomes quite horrible. It's 
also interesting to note that with bigger partition sizes, 
the subquery approach tends to perform better; at least 
that's the case in the test system I used, with a single 
processor with two cores. Of course, the results on 
your system might vary with different hardware, as the 
number of processors and other factors might have an 
important effect, especially when parallelism is handled 
differently between the solutions. 
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SUBQUERIES AND JOINS 
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Graph showing effect of change in number of partitions 
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Graph showing effect of change in partition size 


Subquery vs. Join 
Both the subquery and join solutions perform reason- 
ably well when very small partitions are involved (up to 
several hundred rows per partition). As partition size 
grows, the performance of these solutions degrades 
in a quadratic (N?) manner, becoming quite poor. 
But as long as the partitions are small, performance 
degradation caused by an increase in number of parti- 
tions is linear. One factor that might affect your choice 
between using a subquery-based or join-based solution 
is the number of aggregates requested. As I discussed, 
the subquery-based approach requires a separate scan 
of the data for each aggregate, whereas the join-based 
approach doesn't—so you'll most likely want to use 
the join approach when you need to calculate multiple 
aggregates. Next month Ill continue exploring other 
solutions to running aggregates. SQL 
InstantDoc ID 101623 
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Join Us at 


Tech-Ed 
2009 


TechEd is one of the most 
significant IT conferences of 
the year, and TechEd 2009 is 

no exception. Windows IT Pro 
and SQL Server Magazine will 
have a significant presence 
at the show. Here are some 
highlights to watch for: 


ence activities, products and services 


spotligh 


north america 2009 


Win a Prize at the SQL Server Magazine Booth! 

Be sure to stop by the Windows IT Pro/SQL Server Magazine booth (#411) to sign up for one 
of our free giveaways and to chat with some of our editors and authors. We always like to hear 
feedback from readers, so let us know what you like (and don’t like) about our coverage. 


Author Sessions and Roundtables 
A few of our authors are giving presentations at the show. SQL Server Magazine 
Contributing Editor Kalen Delaney is speaking about solving real-world DBA 
issues. SQL Server Magazine Contributing Editor Itzik Ben-Gan 
will be speaking on T-SQL. Windows IT Pro Senior Contributing 
Editor Mark Minasi will deliver sessions on Windows Server 2008 
R2 AD features, Windows Kerberos, and Security with UAC/WIL. 


2009 Best of TechEd Attendees’ Pick Awards 

You'll also want to cast your vote in the 2009 Best of TechEd Attendees’ Pick Awards, which 
Wiw Ems) ets TechEd attendees pick their favorite products on display at the show. You'll 
find voting kiosks scattered throughout the show floor that you can use to log 
on to the contest website and cast your ballot. 


Live Blogs and Twitter Feeds 
We'll be covering the show with some live blogs and Twitter feeds, so be sure to bookmark the 
WindowsITPro.com and SQLMag.com websites and follow our Twitter accounts for updates: 
W Best of TechEd Awards: www.twitter.com/bestofteched09 
W Windows IT Pro: www.twitter.com/Windowsitpro 
E SQL Server Magazine: www.twitter.com/SQLServerMag 
W Jeff James: www.twitter.com/jeffjames3 


W Amy Eisenberg: www.twitter.com/witproamy 


E Sheila Molnar: www.twitter.com/sqlmagsheila 
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* Table fragmentation 
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* Resource contention 
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“Stor 
orage 


QL Server 
Environment 


Tuning your storage resources can save you 
time and improve SOL Server performance 


task that's frequently skipped when setting 
A up a new SQL Server system or application 

is proper capacity planning. With storage 
costs falling every quarter, it's easy to tell your storage 
administrator that you need 500GB of storage. But 
what have you really asked for? Will 500GB of storage 
last 6 months, 3 years, or 10 years? Without the answer 
to this question, you might not be asking for the right 
thing. Also, you need to consider the speed of your 
storage. Some storage is more expensive than other 
storage. Although your organization might not charge 
back when storage is requested, as a member of the IT 
staff it's your responsibility to ensure the department's 
resources are used effectively. Let's look at how to deter- 
mine how much storage you need, how to configure 
your disks, and the types of storage available. 


Capacity Planning 

The easiest part of capacity planning is determining 
how much storage space your database requires. You 
need to figure out the total number of bytes that you 
will need to use over n months. Although calculating 
this number might appear to be a rather daunting task, 
when you break it down to each table it's not as scary. 
First, add up the size of all the static lookup tables. The 
size of these tables won't change from month to month, 
so once you have that value you can just add it onto 
the end. For the tables that will be growing in size every 
day/month/year, you need to know a few key pieces of 
information. You need to know the average number 
of rows to be added per cycle. A cycle could be a day, 
a month, or a year—however often the data will be 
coming into the system. If it's a real-time system, such 
as an order entry system, ticketing system, or network 
security system, daily is probably the best assumption. 
You also need to know the average length of each 
row. The developers can probably help you answer 
this question. Also, you'll want to find out what your 
organization's record retention policy is. Do you plan 
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to keep data for a month, a year, or forever? You also 
need static numbers; in this case, the total number of 
bytes per data page on the disk is 8060. 

From here some basic arithmetic will provide you 
with the total number of bytes. For this example, we'll 
assume a daily cycle of 20,000 rows, with an average 
row length of 187 bytes, and a corporate record reten- 
tion period of 3 years. The first calculation that you 
need to do is to figure out how many rows fit into each 
data page. This calculation is important because SQL 
Server can't split a row between two data pages. Next, 
take the number of bytes per page and divide that by 
the number of bytes per row (8060/187), which gives 
you 43.101 rows per data page, as shown in Web Ta- 
ble 1 (www.sqlmag.com, InstantDoc ID 
101672). You'll want to round this number 
down because SQL doesn't page split, which 
gives you 43 rows per 
data page. Then take 
the number of rows 
per (daily) cycle and 
divide that by the 
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number of rows per 
data page (20000/43), 
which gives you 465.11 
as the number of data 
pages you need for the 
table, as shown in Web 
Table 2. Because you 
cant have part of a 
data page, round this 
number up to 466 data 
pages Although each 
data page can hold 
only 8060 bytes, the 
data page itself 1s actu- 
ally 8KB in size. Now 
take 466 data pages 
and multiply that by 
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STORAGE FOR YOUR SQL SERVER ENVIRONMENT 


RAID 5 (Single Parity Disk) 


White Data Pages are Writeable, Black Data 


Pages are for Redundancy 


RAID 5 (Dispersed Parity) 


data when the failed disk 
is replaced. Some SAN 
vendors will place all the 
parity information on a 
single disk, and some will 
spread the parity informa- 
tion across all disks in the 
RAID array. Although 
both techniques will give 
you the same level of pro- 
tection, having the parity 
spread across all the disks 
in the array will increase 
the speed of the array 
because the additional 
disk can be used for reads 


Figure | 


How RAID 5 and 
RAID 10 arrays 
can be laid out 
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8KB to get the total amount of data added to the table 
each day, which is 3728KB or 3.64MB, as shown in 
Web Table 3. If you stretch this number out to a month 
(30.5 days on average), you get 111.02MB per month. 
Now multiply that number by the retention period (3 
years) to get 3996.72MB or 3.9GB over three years, as 
Web Table 4 shows. 

After you’ve determined how much space each 
table requires, add the numbers to get the total space 
needed for the database over your retention period. 
If your retention period includes moving data to an 
archive database for reporting purposes, your actual 
data sizes will be smaller because some data is being 
removed from the tables. After figuring out how much 
space your data requires, you then need to think about 
the speed and redundancy of the disks that will be 
hosting your data. 


Configuring Your RAID Arrays 
On paper, everyone says to use RAID 10 for your 
database. It has the fastest write response and there’s 
no parity calculation overhead. However, when people 
typically talk about storage they forget one important 
thing: cost. Although RAID 10 will get you the best 
performance, it’s an extremely expensive RAID level to 
use. RAID 10 takes twice as many disks to configure, 
and those disks aren’t free. RAID 10 should be used in 
some cases, but deploying it to all your databases by 
default is typically an extremely inefficient use of your 
storage resources. The majority of databases that are 
configured for RAID 10 storage would probably work 
just fine on RAID 5 storage. 

It’s important to understand the difference between 
RAID 5 and RAID 10 before deciding which RAID 
level to use. RAID 51s also called a stripe set with parity. 
As the data is written to the disk, a parity calculation is 
done on each data page. This parity calculation is written 
to the disk with the data page and used by the array in 
the event of a disk failure to let the array rebuild the 


and writes. 

RAID 10 is also called a mirrored stripe. The 
number of disks in the array is split in half, with each 
half going into a single strip. These two strips are then 
mirrored against each other. In the event of a disk 
failure, the strip that contained that disk is unusable 
until the disk is replaced and the data is mirrored from 
the undamaged strip to the now repaired strip. Because 
there’s no parity to be calculated for each write, the 
writes can happen faster; however, because only half 
the disks in the RAID 10 array are used for reading and 
writing data, twice as many disks are needed to get the 
same amount of capacity. Figure 1 shows the two ways 
that RAID 5 arrays can be laid out, as well as how a 
RAID 10 array is laid out on the physical disks. 


Tuning the Cache 

One thing to remember when working in a SAN envi- 
ronment is that the SAN has cache—a lot of cache. 
That cache is set up to allow massive amount of writes 
to be cached, meaning that when SQL Server flushes to 
disk it’s not actually writing to the disks. Instead, the 
SAN is caching that write into its cache to be flushed to 
the actual disks at a later time. Because of this caching, 
the RAID level means nothing to SQL Server because 
SQL Server isn’t actually writing to the disks but rather 
writing to the cache. 

By default, most SAN systems come with the 
cache set up for 50 percent read cache and 50 percent 
write cache. Although this setup works great for 
things like data warehouse systems, it’s not so good 
for OLTP databases. SANs use read ahead to try and 
predict what information the host system is going to 
be looking for and read it into the read buffer before 
the host actually requests it. This way, the data has 
been read from disk and the SAN can simply give out 
the data from the read cache. However, in an OLTP 
application (e.g., databases, Exchange, file servers) the 
parts of the disk that need to be read from are rarely 
in sequential order. They're usually randomly placed 
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SPECIAL ADVERTISING SUPPLEMENT TO SQL SERVER MAGAZINE 


nless you've been vacationing off the 

planet Earth for the past few months, 

you may have noticed that companies 

are finding it much more difficult to 
spend money. Every organization wants to cut 
costs and run leaner, meaner, more efficient opera- 
tions that do more with less. This translates into 
shrinking budgets and in some cases, staff. One 
way to achieve this cost-saving goal is through 
consolidation. 

According to industry analysts, companies can 
typically realize cost savings in the range of 20% 
by consolidating their database environments. It’s 
no fluke that consolidation of SQL Server instances 
and databases has moved to the top of project 
lists for many companies. 

The original driver for many consolidation ef- 
forts was SQL Server sprawl: many organizations 
wound up having many more SQL Server instanc- 
es and databases than they originally thought, so 
reducing them into a more manageable number 
to contain costs—as well as standardizing deploy- 
ments and administration—made sense. SQL 
Server consolidation provides other benefits as 
well, such as better availability, improved resource 
usage, and the ability to grow. While SQL Server 
sprawl is still arguably the main technical and 
business driver behind consolidation, virtualiza- 
tion has become another strong driving force. 

Virtualization lets you run one or more guest 
operating systems under a single host. Examples 
of virtualization products are VMware's ESX Server 
and Windows Server 2008 with Hyper-V. These 
guest operating systems act just like a physical 
server. Users and applications would not be able 
to easily detect if their server is physical or virtual. 
Virtualization obviously saves companies money 
not only because a new server is not needed each 


Multiple Instances on Physical Hardware 


Pros 

e Tried-and-true method of consolidating SQL 
Server 

e Scale only limited by the physical hardware and 
the workloads placed on top of it 

e Easier to make consolidated SOL Server deploy- 
ments available 


Cons 

* To have better consolidation ratios, you need to 
avoid blade servers; blades won't really work with 
large-scale consolidation 

* Proper planning for multiple instances is not as 
easy as just spinning up a virtual machine 

* May need assistance via tools (e.9., WSRM) and 
features (e.g., Resource Governor) to help con- 
strain instances 


time one needs to be deployed, but because it 
reduces the time for deployment from months 
(which may include acquisition) to a matter of 
hours or days. 


A Trend Toward Virtualization 
Virtualization has been around for quite some time, 
but only in the past two or three years has it gained 
more significant popularity in the Windows-based 
world. IT organizations first started using virtualiza- 
tion for web servers, file servers, print servers, and 
application servers. But it should surprise no one 
that the spotlight eventually turned toward servers 
that run SQL Server. 

Since SQL Server 2000, SQL Server has had the 
ability to install multiple SQL Server installations 
per server. Each SQL Server installation is known 
asan instance. The problem with trying to deploy 
multiple instances with SOL Server 2000 is that 
both the hardware and the operating system (most 
early SOL Server 2000 deployments were done with 
an edition of Windows 2000 Server) were not as 
scalable as the systems and Windows server operat- 
ing systems available today. Everything was 32-bit. 
Going beyond 4GB of memory meant that all of 
it was fixed, processors were single core, and the 
maximum capacity of a lot of hardware was around 
8GB of memory. Needless to say, configuring more 
than two instances of SQL Server on a single stand- 
alone server or even a cluster was difficult at best. 

Fast forward nearly 10 years and it's a very differ- 
ent world. 64-bit is everywhere. Hardware is cheap 
and powerful — even a blade server will most likely 
outperform some of the largest boxes from 10 
years ago due to the advances in computer tech- 
nology. With either SOL Server 2005 or SQL Server 
2008 It's much easier to scale and deploy multiple 
instances of SOL Server using larger servers than 


Virtualization 


Pros 

e Great for development and QA as well as legacy 
systems that cannot otherwise be consolidated; 
reduces overhead and maintenance of those 
environments 

e Easy for deployment for a guest; can be as simple 
as a click of the mouse 

e Many IT shops already use virtualization, so there 
is very little learning curve 


Cons 

e Can wind up in a worse situation than before since 
each virtual machine is equivalent to a physical 
server; may lead to virtualized server sprawl 

e Guests may not scale as well as physical servers 
due to limitations imposed by the host virtualiza- 
tion platform 

* Deploying virtualized SQL Server guests can pose 
challenges in making them available 


In recent years, application consolida- 

tion has been a hot topic within many IT 
organizations. Why should customers consider 
database consolidation now? 


AN usual deployment scenario in Microsoft 
environments has had applications paired 
up with dedicated database servers. Over time, 
this has resulted in database server sprawl. We 
recently conducted benchmark tests using our 
ES7000 Model 7600R server and we were able 

to achieve a 64:1 consolidation ratio with an 
actual increase in performance. The economics of 
these tests were exceptional: they revealed that 
organizations can save up to 50 percent of the 
cost of maintaining their database infrastructure 
by consolidating multiple database servers onto 
a single, high powered enterprise server. These 
results included over 70 percent savings in FTE 
support costs as well as significant savings in 
datacenter facilities costs. 


With the trend toward smaller computers, 
why consider a larger server like an ES7000 
Model 7600R? Isn't it more expensive? 


AS data center needs a strong mix of 
server types and sizes. However, when 
considering any consolidation effort in today's 
economy, you have to think "optimization" When 
using virtualization or multiple database instanc- 
es, to get the best performance and consolida- 
tion ratio, it has to be done on a server that can 
support your workload and cumulative transac- 
tion rates. Unisys has shown consistently that a 
scaleable system like an ES7000 Model 7600R 
will cost less over time than multiple smaller 
servers while meeting or exceeding your service 
expectatons. | encourage you to talk to your local 
Unisys representative about how you can im- 
mediately achieve greater database performance 
while stretching your IT budget through consoli- 
dation of your SQL Server environment. 
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Why SQL Server? Why not just take the 
time and move everything to Oracle? Isn't it 
more enterprise ready? 


SQL Server has been an enterprise-class data- 

base platform for quite some time, and Unisys 
engages with many customers implementing 
mission-critical, scalable SOL Server deployments. 
In fact, Unisys customers host some of the world's 
largest SQL Server databases. Between 2006 and 
2007, SQL Server's market share grew 16.5 percent 
for a good reason. Microsoft SQL Server has proven 
its ability to handle the largest workloads from a 
performance, management and functionality per- 
spective. Unisys would love to talk to you about our 
Microsoft SQL Server success stories. 


Will consolidation help our efforts to be- 
come more green? 


Absolutely. One of the benefits of consoli- 

dation is that you are using fewer resources. 
In fact, our tests show that a consolidated 
database environment can save as much as 90 
percent of the power and cooling requirements 
of a typical database server sprawl environment. 


Mark R. Feverston 

VP, Unisys 

Systems & Technology 
Microsoft Solutions 
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information technology infrastructures for clients planning 
to more closely link IT with business objectives. He drives 
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needs and value, product/program requirements, develop- 
ment priorities, and timely delivery of new Microsoft based 
solutions. 


it ever was with SQL Server 2000. Unfortunately, 
many companies got burned by attempting multiple 
instances with SQL Server 2000, which left a 

bad impression. Despite the experiences with SQL 
Server 2000, multiple instances are a perfect fit for 
consolidation. 


SQL Server Instances vs. Virtualization 
Not less than a year ago, | was not in favor of virtual- 
izing SQL Server. But the call to virtualize SQL Server 
has grown louder and louder, the space around it has 
matured, and much more knowledge and guidance 
about how to virtualize a database server is readily 
available. The ultimate problem with this approach, 
though, is that the call for virtualization is not coming 
from the SQL Server experts: the DBAs. It’s coming 
from elsewhere, and many DBAs are forced into using 
virtualization whether they like it or not. That ap- 
proach is a recipe for disaster. This section will present 
the pros and cons of instances vs. virtualization for 
SQL Server consolidation so that the decision made is 
an informed one. 

First and foremost, if you want effective SQL Server 
consolidation with a good ratio of guests per host, or 
instances and databases to a physical server, consider 
acquiring smaller numbers of larger servers instead of 
just buying blades or pizza box-style servers. Do 
blade servers cost less, have good performance, 
and consume fewer resources than your current 
servers? They most likely do, but depending on how 
much consolidation that needs to be done, it can be 
cheaper to buy a few bigger servers to consolidate 
onto instead of many smaller ones. Plus, fewer physi- 
cal servers means less to manage in the data center 
(fewer network ports, SAN ports, and so on). Larger 
servers give you access to more processing power, 
more memory, and better overall scalability and 
growth. 


A No. 1 Concern 

Performance is the number one concern when it 
comes to consolidating production SQL Servers. That 
argument has a solid foundation: many virtualization 
products limit the amount of processor and memory a 
guest can utilize, and all I/O is shared with every other 
guest. I/O is always controlled by the host and how it 
deals with how much each guest can consume. Some 
virtualization platforms have tools and algorithms for 
dealing with these sorts of issues, but the reality is 
that large SQL Server deployments that utilize large 
amounts of memory and processor, and generate a lot 
of I/O, would not be candidates for consolidation via 
virtualization — or a candidate for virtualization at all. 
Physical servers with multiple instances of SOL Server 
may better utilize resources, and there are tools such 
as Windows System Resource Manager and the new 
Resource Governor feature in SQL Server 2008 that 
can help you manage processor and memory in a 
consolidated SQL Server environment. Before decid- 
ing on whether to virtualize or use physical servers, do 


some testing in your environments to see where the 
limitations would be related to performance. What 
you learn will help you make smarter decisions about 
what you may want to virtualize, what would be bet- 
ter consolidated on physical hardware, and what may 
be better left alone. 


Ensuring High Availability 

You should have higher availability after consolida- 
tion, not less. High availability can be a potential 
problem with a virtualized server. For example, 
failover clustering is one of the most popular meth- 
ods of availability for a SOL Server instance. Cluster- 
ing was designed with physical hardware in mind 
(redundancy via separate hardware). With virtualiza- 
tion, as ofthe writing of this paper, failover clustering 
is not supported for SQL Server instances installed 

in a guest. To make a virtualized SQL Server instance 
and its databases available, another method must be 
employed. Most of the popular virtualization plat- 
forms do have features to assist with availability of the 
guest, but in some cases you may need to deploy a 
method directly related to SQL Server (such as data- 
base mirroring) instead of or in addition to whatever 
is done at the host level. This could translate into re- 
doing an entire SOL Server availability strategy for an 
organization, which may wind up being expensive in 
terms of people and processes. It's important to note 
that Windows Server 2008 R2 will have the ability to 
not only cluster the hosts, but in the event of a node 
failure, switch guests to another node in the cluster, 
providing continuous availability. Because Windows 
Server 2008 R2 is not yet released, it's not known if 
SQL Server will support this functionality as soon as 
the operating system is released. It's certainly some- 
thing to watch. 


An Example to Ponder 

Consider this example: You consolidate 15 servers 
and 20 instances containing 250 databases onto 

a four-node failover cluster. Those four nodes are 
hosting seven clustered instances of SOL Server 2008 
along with the 250 databases spread across the seven 
instances. If any single one of the nodes fails, what- 
ever instances are currently running on that node will 
fail over to another node. If you chose to virtualize, 
you may not have a higher level of consolidation; the 
existing 15 physical servers may just be virtualized 
and placed under one (or more) hosts. If that host 
fails, everything running on it needs to have some 
sort of availability plan because it's like putting all 

of your eggs into one basket. Do you replicate at a 
storage level and have another virtual guest powered 
down and ready to go? That would be an expense 
over and above the cost of the host servers. Do you 
use SQL Server features such as database mirroring 
or log shipping to make the 250 databases available? 
That could work, but that is also a lot of work to get 
250 databases up in a reasonable amount of time. Do 
you have enough DBAs to handle that situation? Do 


Top 10 


Reasons to Consolidate 
SQL Server Instances 
and Databases 


4 Out of all of the applications and data- 

bases that are in your environments, how 
many are actually in use? This could be a chance to 
reduce costs during the consolidation process by 
weeding out the applications that are no longer in 
use and archive the databases. This not only takes 
care of the physical aspects associated with the 
application, but they would no longer need to be 
formally supported reducing costs of administra- 
tion. 


Are you in danger of being far behind the 

SQL Server deployment curve? A significant 
number of SQL Server 2000 deployments are still 
out there, and there are now two versions beyond 
SQL Server 2000 released. At some point, those 
old instances will become an albatross. Consoli- 
dating them is an opportunity to upgrade the 
databases and update the applications to ensure 
that they will have full support from Microsoft 
and your hardware vendors after the move. 


Can you ensure that every instance and/or 
database in your environment was designed 
and deployed with availability in mind? Chances 
are that only a few applications and server 
deployments are considered highly available. 
By building availability into the consolidated 
SQL Server environment, you will increase the 
availability of the backend, and by doing that, the 
applications which use that SQL Server back end. 


' How are the SQL Server instances and 
databases managed? Is each one done in 

the same way? Chances are they all have different 
standards for administration (if any maintenance 
or administration is being done at all). Consoli- 
dation allows you to ensure that the entire SOL 
Server environment is managed in the same way, 
reducing the overhead to all administrators (espe- 
cially the DBAs). 


6 Do you have some servers that are over 
utilized? While this is the exact opposite situ- 
ation of #5, most environments have at least a few 
SQL Servers that are constantly running out of 


space or out of capacity in other ways. Consolida- 
tion can not only be the time to right-size those 
instances and databases, but also ensure that 
there is enough capacity for future growth. 


Are the majority of the physical SQL Server 
5 servers underutilized at only 10 or 20 per- 
cent? This is common when most applications get 
their own backend. Consolidation allows you to 
optimize server utilization and give you better re- 
turn on your hardware investments and account 
for future growth, too. 


Another consequence of SQL Server sprawl is 

the cost of licensing. You may be out of compli- 
ance with your software vendors (including Micro- 
soft). Consolidation allows you to contain licensing 
costs and keep them manageable and predictable. 


3 Are you running out of or are you out of data 
center space for new deployments? Due to 
SQL Server sprawl, you are consuming valuable 
rack space, electricity, cooling, ports, and all other 
physical aspects with older, less efficient servers. 
Consolidating and replacing many servers with 
fewer, more efficient ones can reduce the overall 
cost to keep the servers running. 


2 Is each Windows Server, SQL Server instance, 
, and database configured differently? This is 
common in many environments, and is a head- 
ache for Windows administrators and DBAs. Con- 
solidation allows you to standardize the configu- 
ration and get down to a manageable few instead 
of everything being a one off. Standardization 
leads to more stable production environments. 


Do you have any idea how many SQL Server 

instances or databases exist in your envi- 
ronment? Are you constantly discovering new 
instances whose deployment you had nothing 
do to do with? Does every application get its own 
database server? If so, then you are suffering from 
SQL Server sprawl. Consolidation allows you to 
contain the sprawl and centralize SOL Server in- 
stance and database deployments going forward. 


you use a technology that is made for the virtualiza- 
tion platform? You could, but how does SQL Server 
behave if it’s used? These are the types of questions 
you must ask before you deploy, not after some- 
thing goes wrong. What on paper seems like an easy 
win for virtualization to the IT department can very 
quickly become an unmanageable solution for DBAs. 
If you already rely on certain availability features, 
techniques, and processes, they may be better 
implemented on physical hardware. Some large 
hardware boxes allow you to subdivide them into 
smaller servers, So you can save on physical space for 
cluster nodes if failover clustering is your main avail- 
ability strategy. Remember to match up what your 
consolidation goals are with your service level agree- 
ments to ensure they are in sync; do not pursue any 
consolidation strategy that will compromise a key 
component of your SQL Server deployments. 


Virtualizing Non-Production Environ- 
ments and Selected SQL Server De- 
ployments 

Virtualization is an almost guaranteed slam dunk in 
development and testing environments. It doesn’t 
matter whether developers or testers are given their 
own virtual environment, or whether they use a 
shared infrastructure. Using virtualization in these 
non-production environments makes sense and may 
be a good starting point to see how well virtual- 
ization fares. One major benefit is that if develop- 
ers mess something up, they may be completely 
isolated. Getting them back up and running can be 
as simple as copying the virtual image. Virtualization 
is generally not suited for performing tasks such as 
stress tests and benchmarking. 

One of the best uses of virtualization is virtualizing 
older servers with configurations that for one reason 
or another cannot be upgraded or easily moved or 
consolidated. An example would be servers that have 
SQL Server 6.5 or SQL Server 7.0 deployments that still 
must be kept running (for whatever reason). These 
are servers which generally no one knows about (the 
person who deployed them is long gone), and no 
one wants to touch them for fear of breaking them. 
Virtualization is the only solution for getting rid of the 
physical server since there may be no way to upgrade 
the database server to a later version of SQL Server. 

Can virtualization be used for production SQL Serv- 
ers? Absolutely, but it will definitely not be a perfect 
fit. For example, you're most likely not going to take a 
currently overloaded four-processor (each processor 
being dual core) system with 8GB of memory and its 
own dedicated disk subsystem, convert it to a virtual 
machine, and expect to get better performance if 
the guest is now going to be constrained by being 
allocated only a portion of physical processors and 
a shared disk subsystem. There's a chance it could 
perform better, but in reality it may be worse. When 
considering virtualization for production SOL Server 


deployments think about the virtualization host 
platform's limitation and set up guidelines for deploy- 
ments. You don't want a situation where you need to 
extract it from a VM and make it physical. 


The Hidden Costs of Virtualization 

Keep one major virtualization caveat in mind: from a 
Windows and SQL Server administration standpoint, 
it's the same as having a physical server to manage. 
You're trying to reduce your footprint and improve 
administration. If you virtualize SOL Server and end 
up with a 1:1 ratio in comparison to where you are 
now (with the exception of less hardware), not much, 
if anything, will have improved. Instead of having 
SQL Server sprawl, you'll now have virtual sprawl. By 
no means am | suggesting that you not virtualize; the 
message is that you should not be overzealous if you 
do choose to go down the virtualization path. You 
could quickly end up where you started. Every virtu- 
alization platform will require specialized tools and 
utilities to manage virtualization in addition to what 
needs to be done to manage each server, instance, 
and database on a day-to-day basis. If your IT workers 
are already overwhelmed, adding more complexity 
to their job may not be the right decision to make. 


Summary 

The question for IT organizations and DBAs is no 
longer whether or not to consolidate SOL Server 
instances and databases. The new question that 
must be asked is: How should this consolidation be 
accomplished? Both multiple instances deployed on 
physical hardware as well as virtualization are valid 
choices for SQL Server consolidation. Chances are 
you'll employ a strategy that uses both methods. It's 
not a black and white decision, as some may think. 
The only guarantee is that no matter what method 
you choose for consolidation, you'll need to buy new 
hardware to support the consolidation. You should 
work with your preferred hardware vendors to see if 
buying fewer, larger servers will be more cost effec- 
tive, highly performing, more easily managed and 
provide the growth you need rather than buying 
larger numbers of smaller, cheaper servers. 
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throughout the drive. The read ahead might actually 
cause more load than the benefit it’s providing. You 
might be better served to reduce the ratio of read cache 
to write cache from 50/50 to 30/70 or 20/80. Reducing 
the ratio lets SQL Server write more data to the cache 
where it really needs the speed. 

This cache change also improves the case for RAID 
5 disks. Because SQL Server will actually be talking to 
the disks only when it’s reading from them, you want the 
most disks processing the read as possible. With RAID 
10, only half the disks are used for reading data, and 
with RAID 5 all of the disks (or all but one depending 
on how the SAN vendor handles RAID 5) are used. 


Storage Tuning 

In addition to tuning your cache, there are some things 
that you can do at the LUN and disk level to increase 
performance. You might want to experiment with dis- 
abling the read cache on the LUN itself. Doing so will 
prevent any caching of information from the LUN into 
the SAN’s cache. The reason for this is that SQL Server 
caches a lot of information into its own cache within the 
servers RAM. It rarely needs to go to the disk for data 
to begin with, and when it does, the data won't be avail- 
able in cache. Once the data is in the SAN’s cache, it will 
also be in the SQL Server system's cache. There's only a 
slight performance hit when having read cache enabled 
on your database LUN. The SAN will attempt to read 
ahead when SQL Server is doing reads to the disk to try 
and have the data that SQL Server is looking for avail- 
able in cache so that SQL Server doesn't have to wait 
for a call to the physical disk to get its data. Because of 
the random nature of OLTP databases, the chance of 
the SAN being able to pull the correct data to cache is 
almost none. In OLTP systems where the disk is already 
strained, having the read cache enabled might slightly 
affect performance; however, in systems where the disk 
isn't already strained, leaving the read cache enabled 
shouldn't negatively affect performance. 

Atthe disk level, an important yet often overlooked 
part of storage tuning is correctly aligning the disk. 
When the specifications for the master boot record 
(MBR) were written, storage was pricey. Therefore, 
the MBR was set up to be 63 blocks on the disk, so the 
first bit of data goes in the 64 block. This setup is prob- 
lematic because disks like to do everything in 64-block 
chunks. So because the data is now offset by 1 block, 
each time SQL Server wants to read or write, twice as 
many operations need to be done. In applications such 
as Microsoft Exchange or Oracle ASM where all disk 
activity is done in 8KB (8 block) reads and writes, this 
effect isn't as pronounced. But SQL Server does all its 
disk access in 64K B (64 block) reads and writes. The 
net effect is that every read and write operation requires 
twice as many physical operations to the disk. 

This problem can be fixed fairly easily if the disk is 
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created by using the diskpart.exe command-line tool in 
Windows Server 2008 and Windows Server 2003. After 
the LUN has been presented to the server, log on to the 
server and launch diskpart.exe from a command-line 
window. Then run the command 


SELECT DISK 3 


where 3 is the disk number you just added to the server. 
Then run the command 


CREATE PARTITION PRIMARY ALIGN-64 


This command will create a primary partition on the 
disk with the alignment set to the 64 block of the disk 
instead of the 63 block of the disk. Do this for each 
LUN that has been presented to the server. 

If you have LUNs that aren't correctly aligned, 
this change isn't as easy to do because there's no 
tool to correct this alignment after the partition has 
been created. The easiest 


method is to back up your Although RAID 10 will get 


database and delete the 


database from the server. you the best performance, 


Then remove the partition 


and recreate it with the It'S an extremely expensive 


correct alignment. Next, RAID level to use 


restore the database to the 

disk. Some SAN vendors have client-side migration 
tools that can be used to perform this process online, 
and there’s third-party software available to perform 
this process as well. However, don’t use a back-end 
SAN migration tool because it will copy all the data, 
including the misaligned partition information. 


Shared Everything vs. 

Shared Nothing 

There are two basic techniques for setting up storage: 
shared everything and shared nothing. With the shared 
everything technique, all disks in the SAN are used for 
every LUN. Although this technique looks great on 
paper, it has some drawbacks. If any one disk in the 
array begins to slow down, then every LUN on the 
array is affected. The shared everything approach is 
good if you don’t have a full-time SAN administrator, 
because there isn’t much to do besides create LUNs and 
assign them to storage. 

With the shared nothing approach (more correctly 
called the shared some approach), LUNs are created 
in smaller RAID groups throughout the SAN. Small 
numbers of physical disks are created in RAID groups 
of various configurations and sizes and the LUNs are 
placed within these RAID groups. The good thing 
about this configuration is that if a specific LUN or 
disk starts to cause a slowdown, the only LUNs that 
are affected are the ones within that LUN's RAID 
group. The downside to this system is that it requires 
more hands-on configuration and more knowledge 
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of the I/O levels that you need to support. The upside 
is that if a LUN needs more I/O, you can move it to 
another RAID group or even have it span more than 
one RAID group to give it more I/O throughput. 


SAN vs. DAS 

Today, the SAN is king in the world of storage. The 

SAN is very fast and usually made up of hundreds 

of Fibre Channel drives, all of which are connected 

to the server via high-end fiber cables. However, 
SAN solutions are very 
expensive, often costing 


The sh ared eve rything hundreds of thousands of 
approach is good if you dollars just to get started. 


This brings us to DAS. 


don't have a full-time SAN PAS is a great solution 


if your company doesnt 


administrator because there need a large-scale SAN 


solution. DAS offers you 


. , . 
isn't much to do besides pretty much all the same 
create LUNs and assign capabilities as a SAN, just 
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at a smaller level, and 


them to storage. usually with less cache. 


DAS devices are made up 
of anywhere from 2 to 45 SCSI, SAS, or SATA drives, 
with anywhere from a few megabytes of cache to a 
gigabyte of cache. 

All the same configuration options are usually 
available with DAS as with SAN, including read cache 
and write cache balancing. Just like SAN drives, DAS 
drives still need to be aligned (as do all RAID arrays). 
Unlike a SAN, a DAS usually won't let you control 
cache settings on a disk-by-disk level. Also, DAS 
systems are connected to a specific server, so you can't 
have more than one server connected to a DAS, as you 
can with a SAN. 

The biggest drawbacks to using DAS for your 
databases are the lack of growth potential, because 
there's a physical limit on the number of disks you can 
put into a DAS, and the fact that you can't pool your 
resources between multiple servers with DAS devices. 
Another problem you might run into with DAS devices 
is that you can't spread a single RAID array beyond 
the number of disks in a single drive shelf. Most DAS 
drive shelves hold 15 drives, and each drive shelf is 
connected to a single RAID controller. Because the 
RAID controllers don't talk to each other, you cant 
use RAID between drive shelves. However, these draw- 
backs are offset by the very attractive entry cost. DAS 
storage is inexpensive when compared with the cost of 
purchasing a SAN solution. 


iSCSI and the Database 

A newer storage technology that's just breaking into 
the database world is iSCSI. iSCSI uses your existing 
Ethernet to connect the database to the storage. Many 


SAN vendors now include iSCSI as part of SAN 
systems. The upside to iSCSI is that you get SAN 
quality storage without the additional cost of the Fibre 
Channel equipment. The downside to iSCSI is that all 
your storage traffic now has to compete for bandwidth 
with your regular Ethernet traffic. If you have a slow 
or extremely busy Ethernet network, iSCSI probably 
isn't the solution for you. Another potential problem 
with iSCSI is that because your storage traffic is going 
over the TCP network, you have to deal with the TCP 
timeout settings. If your storage array fails for some 
reason, Fibre Channel will time out very quickly and 
retry along another path. However, the timeout for TCP 
packets is much higher, which will be represented within 
your database as blocking or processes timing out. 

To address network bandwidth issues, network 
engineers use a technique call VLANing to separate 
and isolate network traffic. VLANing ensures that 
traffic on your iSCSI network is isolated from other 
parts of your network. This isolation is a logical 
separation of network segments into Virtual LANs 
(VLANs) at layer 2 of the Open Systems Intercon- 
nection (OSI) Reference Model. A VLAN is a logical 
domain within the network switch that lets only 
traffic that has been tagged for that VLAN access 
that network switch. Traffic within a VLAN, including 
broadcast traffic, is kept isolated within that VLAN. 
Any traffic destined for another VLAN will have to be 
routed. This helps prevent non-iSCSI network traffic 
from affecting your iSCSI traffic and helps prevent 
your iSCSI traffic from affecting other non-iSCSI 
traffic. VLANing works at layer 2 and separates 
traffic only at the layer. To further isolate traffic for 
applications that require layer 3 switching or routing, 
VLANs can be associated with a routing subnet that 
will let traffic cross logical domains (VLANs) to com- 
municate with other VLANs. Furthermore, layer 3 
routing can also be configured with an ACL to allow 
or prevent specific traffic within specific subnets from 
communicating with each other. Other techniques at 
layer 4 and layer 5 are also available, but that’s beyond 
the scope of this article. 


Get the Best Performance Out 
of Your SQL Server 
Properly setting up and configuring the physical disks 
is essential to getting the best possible performance out 
of your server. This is especially true with SQL Server 
(or any database server) because database servers read 
and write data to and from the disk in much higher 
quantities and at much higher speeds than other servers 
in the enterprise. Therefore, you want to make sure that 
every issue is addressed when configuring your storage, 
especially when you consider how much time it takes to 
make these changes after implementation. SQL 
InstantDoc ID 101672 
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Focus on Files and 


File Groups 


In an effort to improve performance, don't 
create too many files in your database 


[ you're like most SQL Server DBAs today, you 
often wonder how many data files or file groups 
you should have when you create a new user database. 
If you're not sure about the difference between a file 
and file group and the purpose of each, a data file is an 
OS file in which SQL Server stores the contents of the 
database (e.g., data, indexes, metadata) on the physical 
storage medium, and a file group is a logical container 
for managing or grouping one or more data files. You 
can choose which file group you want to associate a 
database object to, and SQL Server then stores the data 
or indexes in the associated file or files in that group. 
(For more information, refer to SQL Server Books 
Online—BOL-—for these topics before proceeding.) 

I generally see one of two scenarios regarding the 
number of files and file groups for a given database. 
In the first, most common scenario, users utilize the 
default settings and create a single file in a single, pri- 
mary file group. In the second, more extreme scenario, 
users create many, many files in one or more file groups. 
Of course, these scenarios represent two ends of a spec- 
trum of possibilities, but these are the most popular 
methods people use to create new databases. As you'll 
see, neither is desirable for most situations. 


When Defaults Aren't Enough 
Because this article series is geared primarily toward 
beginners, l'll assume that your databases arent 
extreme either in size or in usage. Most of you can 
operate effectively with just a single data file per file 
group. Having said that, I want to address the first sce- 
nario, in which the user creates databases with a single 
filein the default file group. To minimize recovery times 
and maximize uptime, you should avoid this scenario 
when using SQL Server 2000 or later. 

Let's say you're doing a piecemeal restore: You 
must restore the primary file group first; so, if you have 
user data in the primary file group, you might hamper 
future actions or limit your possibilities by making the 
process longer or more complicated than necessary. At 
a minimum, you should create a secondary file group 
and make it the default for all user objects at the time 
of or immediately after database creation. Because all 
the system objects get created in the primary file group 
when the database is first created, you'll effectively 
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separate user objects from system objects, which is a 
sensible thing to do anyway. 

So, does that mean if you create a secondary file 
group with a single data file, you're always good to go? 
The Microsoft article “Storage Top 10 Best Practices" 
(http://www.microsoft.com/technet/prodtechnol/sql/ 
bestpractice/storage-top-10.mspx) says otherwise: “It is 
recommended to have .25 to 1 data files (per file group) 
for each CPU on the host server." Most DBAs read that 
and immediately begin creating lots and lots of files. If 
you've gone this route, you're certainly not alone. But 
the header actually states that the advice is for “Alloca- 
tion Intensive Workloads”—which most SQL Server 
applications aren't. If you were creating thousands 
of tables or allocating thousands of new extents each 
second, the recommendation would apply to you. 
However, most well written applications don't resort 
to such behavior. Something that often does, however, 
Is the tempdb database, and it's a good practice to have 
multiple files under those conditions, as outlined in the 
Microsoft article *Working with tempdb in SQL Server 
2005" (http:/Avww.microsoft.com/technet/prodtechnol/ 
sql/2005/workingwithtempdb.mspx). 


Groups Are Good 

Now that I’ve pointed out some of the fundamentals 
and misconceptions about files in the database, I want 
to focus on file groups. In my opinion, you need to spend 
considerable time thinking about file groups before you 
create the database in the first place. File groups give you 
the ability to separate user objects from one another. 
Because you can't span a file group with any one file, you 
can never have data from objects assigned to different file 
groups residing in the same physical file. What does this 
buy you? Potentially, a lot. 

Suppose you have multiple physical disk arrays 
available to your SQL Server system. You can place 
the file from file group A on one disk array and the 
file from file group B on another. You've now isolated 
the physical I/O—at the disk level—of objects in one 
file group from that of objects in another file group 
because an object can only reside in one file group. 
This scenario would be impossible if there were 
only a single file group for all the user objects in the 
database. 
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Filegroup Name Logical File Name Physical File Name Space Reserved 

PRIMARY Bad_DB C:\Date\2008DBs\Bad_DB.mdf 10.00 MB 
Bad_DB_Data_2 C:\Data\2008DBs\Bad_DB_Data_2.ndf — 10.00 MB 
Bad, DB, Data 3 CADatai2008DBs!Bad DB, Data 3ndí 10.00 MB 
Bad DB Data 4 C3Datal2008DBs!Bad DB Data 4.ndf 10.00 MB 

Figure 1 A common practice is to have one file group 

: for clustered indexes and another for nonclustered 

A single file group 


Filegroup Name 


Logical File Name 


indexes, which is fine—but the possibilities are end- 
less. You need to predetermine your goals for the 
database and decide how you want to break out your 
objects. If improved performance is your key goal, 
and you have more than one disk array, you might 
want to place the tables that are joined the most onto 
separate file groups. That way, you can take advantage 


Physical File Name 


Space Reserved 


created with four files in a single 
file group, which also happens 
to be the PRIMARY file group. 
There's no way to separate activity 
by database objects when they're 
all together as you see here. 

Now, take a look at Figure 2, which shows three 
files and three file groups in total. The PRIMARY file 
group is totally separate and contains all the system 
objects with no user objects. The two other file groups 
can be used to segregate your user objects, making it 
much easier to move them to another physical disk 
array if the need arises. For instance, if you always join 
two large tables, you can place each table in a different 
file group and place those files 
on separate disk arrays to maxi- 


Space Used 
1.31 MB 


64.00 KB 
64.00 KB 


£4.00 KB 


Space Used 


PRIMARY Good, DB C:\Data\2008DBs\Good_DB mdf 10.00 MB = ize I/O throughput for th 
User Objects | Good DB Data 2 CADatal2008DBs\Good_DB_Data_2ndf 1000 MB 64.00 KB hdc: c cic 
User_Objects_2 Good DB, Data 3 C'Datei2008DBs!Good, DB, Data, 3.ndf 10.00 MB 64.00 KB operations. 
Figure 2. of different I/O paths when you read from disk. If 
Three file groups your goal is to maximize uptime or minimize backup This is all food for thought; everyone has different 


and restore times, you can place static tables into a 
read-only file group and negate the need to back up 
the entire database each night. 

For example, Figure 1 shows a typical database 


needs. But clearly, most of you can benefit from 
focusing on how to better utilize file groups than on 
how many files you need. SL} 
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Integrating 
pase 


SQL Server 


This web-services approach offers 


minimal coding 


ith SharePoint’s growing popularity in the 
W business world, it's become an important 

repository of data. However, this data is 
often isolated from the rest of an organization's data 
system. But by using the native web-service calls in 
Windows SharePoint Services (WSS), you can access 
SharePoint list data and use SQL Server Integration 
Services (SSIS) to transform the resulting dataset 
from an XML source to a relational database table. 
This import routine can be automated via SQL Server 
Agent and scheduled to run at regular intervals to 
keep the SharePoint list and relational database in 
sync. That way, SharePoint list data need not be a 
data island in the midst of an otherwise integrated 
data environment. 


SharePoint Data Storage 
and Access 
Before we walk through the step-by-step process to 
integrate SharePoint data with SQL Server, let's look at 
list-data storage and retrieval in SharePoint. SharePoint 
pages are ASP.NET applications whose content resides 
in a back-end SQL Server database called the content 
database. Other SharePoint-related data, such as list 
data and document library attachments, is also stored 
in the content database. Although server administrators 
might be able to access the SharePoint content database 
and the underlying tables, Microsoft doesn't recommend 
or support accessing SharePoint data this way. 
However, Microsoft does support other ways of 
accessing this data: by using the SharePoint Service 
object model or by using SharePoint Services web 
services. The object-model approach is more develop- 
ment intensive and requires familiarity with a myriad 
of object-model namespaces, hierarchies, classes, and 
enumerations. The web-services approach provides 
a more straightforward alternative. Although it still 
requires some coding, it's fairly lightweight in com- 
parison and is the approach we'll use. 
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SharePoint Web Services 

and SSIS 

SharePoint Services web services include methods 
for accessing data on a website, such as for working 
with lists or site data, as well as methods for custom- 
izing content such as meetings, imaging, document 
workspaces, or search. The web services provide their 
functionality through the / vti bin virtual directory, 
which maps to the %PROGRAMFILES%\Common 
Files\Microsoft Shared\web server extensions 12 
ISAPI physical directory in the file system. The Lists 
web service for example, which is what you would 
call to access list data, can be accessed from the path 
servername/sitename/ vti bin/lists.asmx. 

The web-service method calls often require input 
parameters, typically in plain text or CAML format. 
CAML is an XML-based markup language used exclu- 
sively with SharePoint technologies. As with any web- 
service method call, the result set is in XML format. 

SSIS provides a platform to build a data- 
integration application. Although it uses a Visual 
Studio (VS) 2005 interface to build an application, 
it features drag-and-drop usability for rapid develop- 
ment. It also comes standard with a wide array of 
tools known as Control Flow tasks to connect to 
various types of data sources and destinations. Two 
such Control Flow tasks that we can use for our 
solution here are the Web Service task and the Script 
task. Both these objects can connect to a web service 
to retrieve data. 

To illustrate this integration, I’ve created a sce- 
nario that uses a WSS site containing a list with 
some pre-populated data. I created the site by using 
the Absence Request and Vacation Schedule Man- 
agement application template, which is available for 
download from Microsoft (technet.microsoft.com/ 
en-us/windowsserver/sharepoint/bb407286.aspx), and 
I populated the list, named Absences, with some 
sample data. 
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TABLE |: Variable Names and Default Values 


Variable Name Type 

listURL String 
listName String 
wsOutput String 


Scope Default Value (example) 
Package http://<SiteURL >/_vti_bin/lists.asmx 
Package <ListName> 


Package N/A 


Setting Up a Project and 
Configuring a Script Task 

To create a new SSIS project, open SQL Server 
Business Intelligence Development Studio. BIDS is 
a subset of VS 2005 and is included in the default 
installation of SQL Server 2005 Standard Edition 
or higher. Create a new project of type Business 
Intelligence Projects using the Integration Services 
Project template. 

In the project window, expand the toolbox on the 
left side. This displays a long list of Control Flow items, 
pre-packaged SSIS items that can be combined to form 
a data-integration application and dragged and dropped 
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The Add Web 
Reference window 
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Navigate to a web service URL and click Add Reference to add all the available services. 


URL: [Feto:fired-sreot [absences], vti binflsts. acne x] a Go 


The following operations are supporte: 


© ApplvContentTypeToList 


HE 


Web services Found at this URL: 


d. For a formal definition, 


= 


Web reference name: 


| waLists| 


into the project. The Web Service task is one such item 
on this list and seems to be the obvious choice. 

But it can’t support web services that accept com- 
plex input parameters such as CAML queries, and to 
query a SharePoint list, we need to be able to provide 
a CAML fragment as an input parameter. For that 
matter, it can't be used to call a majority of SharePoint 
web services. (It is, however, compatible with some of 
the simpler web-service methods such as Webs.) 

Instead, we'll use the Script task, a flexible and 
extensible SSIS task item that lets you write NET 
code and provides a bridge between SSIS objects and 
external applications. In the scenario here, we want 
the Script task to call up the SharePoint Lists web 
service, read certain columns of data from a specific 
list in a SharePoint site, and store the resulting data in 


a SQL Server database destination. Follow the steps 
below to add a Script task to your SSIS project and 
configure it. 

Step 1: Set up SSIS variables needed in the script. 
The variables are used to pass parameters to the web- 
service method call, as well as to capture result data 
from the call. Table 1 shows some variable names and 
default values. Be sure to set default values and replace 
contents in <SiteURL> and <ListName> to match 
your environment. <SiteURL> refers to the address 
of the SharePoint site root that contains the list. 

Step 2: Drag and drop a Script task item from 
the toolbox into the Control Flow tab of the SSIS 
package. Double-click the Script task item you just 
added to open the Script Task Editor window and set 
the script variable properties as follows: 

e ReadOnlyVariables: list URL, listName 
* ReadWriteVariables: wsOutput 


Take note of other properties in this window. The 
default script language is Visual Basic .NET. This 
means you can take advantage of powerful features 
of the .NET platform in your code, such as the web- 
service framework. 

Step 3: Click the Design Script button in the Script 
Task Editor window to invoke the script editor. The 
Visual Studio for Applications (VSA) window pops 
up, which is where you write the code for the SSIS 
task. 

Step 4: To leverage the .NET web-service frame- 
work in the script, you need to add appropriate refer- 
ences first. A web-service reference file is a separate 
.NET class file. You can generate the class file in a 
separate VS project and import the file into the SSIS 
script task. 

To do this, start up VS and create a new Visual 
Basic .NET Windows Application project by navi- 
gating to Project, Add Web Reference. In the Add Web 
Reference window, which Figure 1 shows, specify the 
URL of the SharePoint site where the list data resides 
and provide a web reference name (for example, 
wsLists), then click Add Reference. VS generates the 
necessary class files to create the infrastructure to call 
the SharePoint Lists web service. 

Step 5: Save and close the Visual Basic (VB) project, 
then reopen the SSIS project. Browse the folder struc- 
ture of the VS project in Windows Explorer. 

Open the folder location Web References Wists in 
the project directory, and you will see the files that 
VS generated automatically. To import the class file, 
right-click the project in Solution Explorer and select 
Add from the context menu, followed by Existing 
Item, then navigate to the directory containing 
reference.vb. 

Step 6: Back in SSIS script editor, add the fol- 
lowing references to the project: System. Web. Services 


SQL Server Magazine * www.sqlmag.com 


INTEGRATING SHAREPOINT DATA A i 


and System.Xml. Open the Reference.vb file you 
imported in Step 4 above, and add the following code 
at the namespace declaration towards the top: 


Imports System.Web.Services 
Imports System. Xi] 
Imports Microsoft.SqlServer.Dts.Runtime 


Note that the namespace of the Reference.vb file is 
called “wsLists” (or whatever name you specified in 
Step 4). Scroll down in the code and locate the Public 
Sub New() procedure. Replace the line of code that 
starts with “Me.URL” as follows: 


Me.Url = Dts.Variables("list 
URL") .Value. ToStringQ 


Remember, the SSIS variable, listURL, specifies the 
location of the SharePoint list. The above line of code 
will tell the web service where to look for SharePoint 
list data. 

Step 7: Now that you've set up the references and 
variables, it’s time to write the actual VB code to call 
the web service. Open the ScriptMain file in the VSA 
project window. Replace contents of the code block 
Public Sub Main() with the code in Listing 1. 

The global variable wssListService refers to the 
wsLists web-service namespace, as defined in the 
Reference.vb class. Note that this Main subroutine 
calls another function, CallWebService, and col- 
lects the output of the function in a string variable, 
outputXML. Finally, it sends the output to the SSIS 
variable, wsOutput. 

Step 8: Add code from Listing 2 for the function 
CallWebService. The GetListItems method of the 
wsLists Web service returns an XML dataset that 
matches the query parameters. The query parameters 
specified in the above method call are 
* listName: string value representing the name of the 

SharePoint list to be queried. 
* ndViewFields: CAML frag- 

ment of the fields to be 

retrieved from the list. Each 


SharePoint list via the web-service method call, it's 
a good idea to verify that the script task performs 
as expected. Remember, the extracted XML data is 
stored in an SSIS variable, wsOutput. One way to 
verify it will work is to inspect the value in the variable 
wsOutput populated by the Script task in debug mode, 
using the following steps. 

Step 1: Go back to the Public Sub Main() code 
block you created above. Insert a debug breakpoint on 
the line with code Dts.Variables(^wsOutput"). Value 
= outputXML. Save and close the VSA window and 
return to the SSIS package window. 

Step 2: Execute the package by going to menu 
Debug, Start Debugging. 


LISTING |: Code to Call the Web Service 


Dim wssListService As New wsLists.Lists 

Public Sub Main) 
Dim taskResult As Integer = Dts.Results.Success 
Dim stringListName As String = Dts.Variables("listName") .Value.ToString 
Dim outputXML As String 


Try 
wssListService.Credentials = System.Net.CredentialCache.DefaultCredentials 


If stringListName.Length <> Ø Then 
outputXML - CallWebService(stringListName) 
Dts.Variables("wsOutput").Value = outputXML 
taskResult = Dts.Results.Success 

Else 
Throw New ApplicationException("Invalid SharePoint List") 
taskResult = Dts.Results.Failure 

End If 


Catch ex As Exception 
Dts.Events.FireError(@, String.Empty, ex.Message, String.Empty, 9) 
taskResult - Dts.Results.Failure 

End Try 


Dts.TaskResult = taskResult 
Return 
End Sub 


LISTING 2: Code to Call the Function CallWebService 


. . Function CallWebService(ByVal listName As String) As String 
field to be included in the Dim outputNode As Xml.XmlNode 


output should be in its own 
FieldRef element. 

* ndQuery: CAML fragment of 
the query conditions (similar 
to “where” clause in SQL). 
This parameter is optional. 

e stringRowLimit: specifies the 
number of rows to be returned 
in the query. This is also 
optional. 


Debugging 


Now that you've set up the Script 
task to extract data from the 
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Dim xmlDoc As Xml.XmlDocument = New System.Xml.XmlDocument() 
Dim ndQuery As Xml.XmlNode = xmlDoc.CreateNode(Xml .XmlNodeType.Element, "Query" , "") 
Dim ndViewFields As Xml.XmlNode = xmlDoc.CreateNode(Xml.XmlNodeType.Element, "ViewFields","") 


Dim ndQueryOptions As Xml.XmlNode - xmlDoc.CreateNode(Xml.XmlNodeType.Element, "QueryOptions"," 


Dim stringRowLimit As String - "2000" 
Dim outputString As New System.Text.StringBuilder() 


ndQuery.InnerXml = "<Query></Query>" 
ndQueryOptions.InnerXm] = "" 
ndViewFields.InnerXml = "<FieldRef Name-'Title'/»" + _ 
"«FieldRef Name-'Start x0020 Date'/»" + _ 
"«FieldRef Name-End x0020 Date'/»" + _ 
"«FieldRef Name='Description'/>" 
outputNode = wssListService.GetListItems(listName, String.Empty, ndQuery, _ 
ndViewFields, stringRowLimit, ndQueryOptions) 
outputString.Append("«z:data xmlns:z-'£RowsetSchema' >") 
outputString.Append(outputNode.Item("rs:data").InnerXml) 
outputString.Append("</z:data>") 
Return outputString. ToString() 


End Function 
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LISTING 3: XML Schema 


«xs:schema xmlns:nsi-"4RowsetSchema" attributeFormDefault-"unqualified" elementFormDefault-"qualified" 


targetNamespace="#RowsetSchema" xmlns:xs-'"http://www.w3.0rg/2001/XMLSchema"» 
«xs:element name-"data"» 
«xs:complexType» 
«xs:sequence» 
<xs:element minOccurs="9" maxOccurs="unbounded" name="row"> 
«xs:complexType» 
«xs:attribute name-"ows Title" type-"xs:string" use-"optional" /» 


«xs:attribute name-"ows  ModerationStatus" type-"xs:unsignedByte" use-"optional" /> 
«xs:attribute name-"ows Level" type-"xs:unsignedByte" use-"optional" /» 


«xs:attribute name-"ows EventDate" type-"xs:string" use-"optional" /» 
«xs:attribute name-"ows AVMStatus" type-"xs:string" use-"optional" /» 
«xs:attribute name-"ows ID" type-"xs:unsignedByte" use-"optional" /» 


«xs:attribute name-"ows owshiddenversion" type="xs:unsignedByte" use-"optional" /> 


«xs:attribute name-"ows UniqueId" type-"xs:string" use-"optional" /» 
«xs:attribute name-"ows FSObjType" type-"xs:string" use-"optional" /> 
«xs:attribute name-"ows Created" type-"xs:string" use-"optional" /> 


«xs:attribute name-"ows AVMAbsenceType" type-"xs:string" use-"optional" /> 


«xs:attribute name-"ows EndDate" type-"xs:string" use-"optional" /> 


«xs:attribute name-"ows Employee x0020 Name" type-"xs:string" use-"optional" /> 


«xs:attribute name-"ows FileRef" type-"xs:string" use-"optional" /> 
«xs:attribute name-"ows MetaInfo" type="xs:string" use-"optional" /» 


«xs:attribute name-"ows Description" type-"xs:string" use-"optional" /> 
«xs:attribute name-"ows fAllDayEvent" type-"xs:unsignedByte" use-"optional" /> 


</xs:complexType> 
</xs:element> 
</xs: Sequence> 
</xs:complexType> 
</xs:element> 
«/xs:schema» 


E XML Source Editor 
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Data access mode: 
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Figure 2 


Step 3: The VSA window will pop up automatically, 
revealing the code inside the Script task and pausing at 
the breakpoint you inserted on Step 1 above. 

Step 4: Go to the VSA window menu Debug, 
Windows, Locals. The Locals window appears at the 
bottom of the workspace, listing all variables in cur- 
rent scope, including outputX ML. 

Step 5: In the Locals window, go to the value 
column of outputXML variable and invoke XML 
Visualizer by clicking the eyeglass icon menu. Export 
the text in the XML Visualizer window to a text editor 
such as Notepad. Save the text as an XML file. 

Step 6: Back in VSA, close the XML Visualizer 
window and continue the code execution from the 
breakpoint. When the package has finished executing, 
go to menu Debug, Stop Debugging. 


Connection Manager 
in the XML Source 
Editor window 
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Creating an Inline 
Schema 

The XML document you created 
in Step 5 above contains a sample 
of SharePoint list data in a fully 
valid XML format. This docu- 
ment, however, doesn’t contain 
an inline schema. Now you will 
need to supply sample data as 
well as a schema file (XSD) that 
defines the structure of the XML 
data. You can then use an XML- 
compatible application such as 
Microsoft Office Excel 2007 or 
Microsoft Office InfoPath 2007 
to create a schema based on the 
sample data. The steps below 
illustrate how you can use Excel 
to generate a schema from the 
XML output document that you 
created earlier. 

Step 1: Open the XML file using Excel 2007. 
When prompted, opt to open the XML file as an 
XML table. 

Step 2: Excel will notify you that the XML docu- 
ment doesn't have a schema and that Excel will create 
one for you. Click OK. 

Step 3: The schema generated by Excel isn't visible 
in the UI. However, it's accessible programmatically 
through Visual Basic Editor. To launch a Visual Basic 
Editor window in Excel, press Alt+F11 (or in ribbon 
bar go to Developer, Visual Basic). 

Step 4: Locate the VBAProject tree on the left of 
the Visual Basic Editor workspace and double-click 
ThisWorkbook. 

Step 5: In the code area for ThisWorkbook, create 
a procedure as follows and execute it. 


Sub ShowSchemaText () 
Sheet2.Range(“A1”).Value = 
ThisWorkbook.XmlMaps (1) . Schemas (1) . XML 
End Sub 


This extracts the XML property of the first schema 
contained in the first XmlMaps collection and places 
the corresponding text in cell Al of Sheet2. Since the 
workbook currently has only one XML document, we 
can safely assume the first items in both XmlMaps and 
Schemas collection are the items we want. Export the 
text value in cell Al of Sheet2 to a text file and save it 
with an .XSD extension. 

Listing 3 shows the XSD schema file contents for 
the Absence list I used in this example. In the next 
section you will see how to transfer the data in this 
variable to a SQL database table using the Data Flow 
Task item and XML Source adapter. 
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SSIS Data Flow Task 

The Data Flow task is the most common of the SSIS 
task items as it’s capable of facilitating the transfer of 
data from a variety of source and destination types. 
In our particular situation, the source data is of 
XML type whereas our destination is a SQL Server 
database table. Two of the many data adapters built 
in to the Data Flow task—XML Source adapter and 
SQL Server Destination adapter—read the XML data 
from the SSIS variable and write to the SQL Server 
destination table, respectively. To set up such a Data 
Flow task, follow these steps: 

Step 1: Drag and drop a Data Flow task item 
from the toolbox into the Control Flow tab of the 
SSIS package. Connect the Data Flow task item to the 
existing Script task item using the precedence constraint 
(the green line) that originates from the Script task. 

Step 2: Go to the Data Flow tab of the SSIS 
package, which will be empty. From the SSIS toolbox 
on the right, drag and drop the XML Source adapter 
(located under Data Flow Sources), the Data Conver- 
sion adapter (under Data Flow Transformations), and 
the OLE DB Destination adapter (under Data Flow 
Destinations), to the empty area. 

Step 3: Double-click the XML Source adapter 
component to set its properties. In the Connection 
Manager tab in the XML Source Editor in Figure 2, 
you can see the following properties have been set: 

* Data access mode: XML data from variable 

* Variable name: User::wsOutput 

* XSD location: <XSDFilePath>, where <XSDFile- 
Path? corresponds to the path of the XSD schema 
file you created earlier using Excel. 


Step 4: Create the destination table manually for 
the data in SQL Server, if the table doesn't exist, by 
executing the CREATE TABLE SQL script elsewhere 
(not in the SSIS package). I used the SQL script in 
Listing 4 to create my destination table in Adventure- 
Works database. 

Step 5: In the Data Flow tab, connect the XML 
Source adapter to the Data Conversion transforma- 
tion adapter using the green arrow (Data Flow Path) 
that originates from the XML Source adapter. The 
reason for using the Data Conversion adapter here is 
to convert the two date columns in the source XML 
data (Start Date and End Date columns) from string 
values to date/time values. In the Data Conversion 
Transformation Editor, which Figure 3 shows, I set 
the data type for these two input columns to data type 
database timestamp (DT DBTIMESTAMP], which 
corresponds to the SQL datetime data type. 

Step 6: Back in the Data Flow tab, connect the 
Data Conversion Transformation adapter to the SOL 
Server Destination adapter using the Data Flow Path. 
Open the OLE DB Destination Editor window and set 
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the Connection Manager proper- LI 
ties to point to the destination table 


STING 4: SQL Script to 


Create a Destination Table 


you created in Step 4. In the Map- E 
pings page, map the input columns CREATE TABLE [dbo]. [Absences] ( 


(ie, output columns from the 
Data Conversion transformation) 
to appropriate output columns (1.e., 


the SQL Server destination table), 
) 


as Figure 4 shows. GO 


» + Data Conversion Transformation Editor 


| ows Title 


ows_EndDate 


| ows EventDate.Converted database timestamp [DT_DBTIMESTAMP] 
ows_EndDate.Converted database timestamp [DT_DBTIMESTAMP] 


[ID] [int] IDENTITY(1,1) NOT NULL, 
[Title] [nvarchar](255) NOT NULL, 
[EmployeeName] [nvarchar](255) NULL, 
[AbsenceType] [nvarchar](255) NULL, 
[StartDate] [datetime] NULL, 
[EndDate] [datetime] NULL, 
[Description] [nvarchar](255) NULL 
ON [PRIMARY] 


Lox! 


Configure the properties used to convert the data type of an input column to a different data type. Depending on the data 
type to which the column is converted, set the length, precision, scale, and code page of the column. 


Available Input Columns 


ows Employee x0020 Name 
ows AVMAbsenceType 
ows_EventDate 


ows_Description 


Configure Error Output... | 


Figure 3 


Data Conversion Transformation Editor 


E- OLE DB Destination Editor 


Configure the properties used to bulk copy data into a local instance of the Database Engine, 


Connection Manager 
Mappings 


Advanced Available Input Columns 


owt, Employee «0020 Name 
ows_AVMAbsencel ype 


ows_EventDate, Converted 
ows_EndDate, Converted 


ows_Title 
ows_Employee_x0020_Neme 
ows AVMAbcenceType. 

ows EventDate,Converted 
ows EndDate,Converted 
ows Descriphion 


EmployeeName 
AbsenceType 
StartDate 
Enddate 
Description 


Figure 4 
OLE DB Destination Editor 


[oe] ww | we | 
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INTEGRATING SHAREPOINT DATA 


Parameter Mapping 
Result Set 
Expressions 


Figure 5 


JÈ Execute SQL Task Editor 


Biel E 


Ü coarse the properties required to run SQL statements and stored procedures using the selected 


n 


E General 
Name Execute SQL Task 
Description Execute SQL Task 
E Options 
TimeOut 0 
CodePage 1252 
E Result Set 
ResultSet None 
E SQL Statement 
ConnectionType OLE DB 
Connection RED-SRSO1.AdventureWorks 
SQLSourceType Direct input 
SQLStatement TRUNCATE TABLE Absences 
IsQueryStoredProcedure False 
BypassPrepare True 
SQLStatement 
Specifies the query to be run by the task. 


Browse... | Build Query... | Parse Query | 
aN) 


Configuring SQL Statement properties in the Execute SQL Task Editor screen 


IÆ New Job Step 


A General 
1# Advanced 


Server: 
RED-SASO1 
Connection: 


3SHARP\anupk 
=} View connection properties 


Ready 


Figure 6 


BEE 
- ü Help 

Step name: 
[impor SharePoint Data 
Type: 
[sau Servet Integration Services Package x] 
Bun as: 
[SQL Agent Service Account z| 

Set values | Verification | Command ine | 

General | Configurations | Command fles | Data sources | Execution options | Logging | 

Package source: File system E 


Server [ —— Ki 


[- Log on to the server 


User name 


Password 


Package: 


fers\i integration Services Project! integration Services Project] \bin\Package.dtsx — ... | 


Automating the import process via the New Job Step screen 
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Destination SQL Server 

You have now set up the SSIS package to import data 
from the SharePoint list to the SQL Server destination 
table. Save the package and give it a test run by hitting 
F5. If everything goes well, you will see all the package 


items highlighted in green upon execution. You should 
also see the destination table in SQL Server populated 
with data from the SharePoint list. 

Keep in mind that this SSIS package is currently 
set up to import all rows of data from the source list 
each time it is executed, whether the rows of data 
already exist in the destination table or not. This can 
create duplicate rows of data in the destination table 
each time the package runs. One solution is to trun- 
cate contents of the destination table at the beginning 
of package execution. Another solution is to use 
the Slowly Changing Dimension SSIS transforma- 
tion item, which imports only new rows of data and 
updates changes to existing rows of data but is beyond 
the scope of this article. For now, just add the Execute 
SQL task item before the Script task in the Control 
Flow tab of the SSIS package. Set the connection 
property to connect to the destination database, and 
type in the following SOL command under the SQL- 
Statement property, which Figure 5 shows: 


"TRUNCATE TABLE Absences" 


Automating the Import 
Process 

The SSIS package you created above can be run on a 
regular basis so that any data changes in the source 
(Le, the SharePoint list) are reflected in the SQL 
Server table. Using the SQL Server Agent, you can 
set up a job to execute the package at set intervals, 
such as hourly or daily. To set up a SQL Server Agent, 
follow these steps: 

Step 1: Using SQL Server Management Studio, con- 
nect to the SQL Server server that contains the database 
and the destination table we're working with. 

Step 2: In Object Explorer, right-click SQL Server 
Agent and select New Job. Type in a job name under 
General page. 

Step 3: Go to the Steps page and add a new step 
in the New Job Step screen, which Figure 6 shows. 
Ensure that the package path points to the location 
of the .dtsx file you created earlier. 

Step 4: Go to the Schedules page and add a new 
schedule for the job. Select the Recurring schedule type 
to repeat the task on scheduled intervals. Click OK to 
save. 


No More Data Islands 
You have now scheduled the SSIS package to be 
executed automatically during regular intervals, as 
set up in the task schedule. As I hope you can see, the 
web-services approach to integrating SharePoint data 
with SQL Server is relatively painless, and the result 
is well worth it, ensuring that SharePoint is no longer 
an isolated data island in your system. SOL 
Instant Doc ID 101656 
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LOT: 


> IR. 


In Lesson IO, learn how to create stored 
procedures and why youd want to 


term you'll often encounter when talking 

A with DBAs or reading SQL Server docu- 

mentation or literature is stored procedures. 

A stored procedure is simply a compiled database 

object that contains one or more T-SQL statements. 

Although SQL Server has many system-provided 

stored procedures, you can create your own. 

Creating your own stored procedures offers several 
advantages: 

* You can avoid having to store all your T-SQL code 
in files. Stored procedures are stored in the database 
itself, so you never have to search through files to 
find the code you want to use. 

* You can execute a stored procedure as often as you 
like from any machine that can connect to the data- 
base server. 

* [f you have a report that needs to be run frequently, 
you can create a stored procedure that produces the 
report. Anyone who has access to the database and 
permission to execute the stored procedure will be 
able to produce the report at will. They don't have 
to understand the T-SQL statements in the stored 
procedure. All they have to know is how to execute 
the stored procedure. 

* You can enforce database security through stored 
procedures. You can grant users permission to exe- 
cute a stored procedure but not permission to access 
the underlying tables. 

* Although creating applications is beyond the scope 
of this lesson, centralizing code in a stored proce- 
dure lets you reduce the amount of redundant code 
in your applications and insulate the applications 
from the effects of database schema changes. 


The Prerequisites 

Before I show you how to create and execute stored 
procedures, make sure your MyDB database contains 
the following tables so that you can run the sample 
code in this lesson: 


SQL Server Magazine * www.sqlmag.com 


* The Employee table created in Lesson 3 

* The Movie table created in Lesson 5 

* The Genre table created in Lesson 5 

* The revised MovieReview table created in Lesson 6 
(and not the original MovieReview table created in 
Lesson 4) 


If you haven't created these database objects, you'll find 
the code in the 100844.zip file. To download this file, go 
to www.sqlmag.com, enter 100844 in the InstantDoc 
ID text box, and click the 100844.zip hotlink. 


How to Create Stored 
Procedures 
Before you create a stored procedure, you 
need to decide what you want your stored 
procedure to do. In previous lessons, a series 
of T-SQL statements were used to populate the 
Employee, Movie, and MovieReview tables, so I'll show 
you how to create three simple stored procedures that 
you can use to insert new records into these tables. I'll 
also add some code to prevent duplicate records from 
being inserted into the tables. 

To create a stored procedure, you use the CREATE 
PROCEDURE command. This command's basic 
syntax is 


CREATE PROCEDURE 
SchemaName . ProcedureName 
C 
@Parameterl datatype [OUTPUT], 
@Parameter2 datatype [OUTPUT] 
) 
AS 
SQL Statement 1 
SQL Statement 2 


The first portion of the CREATE PROCE- 
DURE statement is where you specify the name of 


Bill McEvoy 


(bill@ cookingwithsql.com) is Master Chef/DBA 
for the Cooking with SQL website. He's been a 
DBA since SQL Server 4.2, and he specializes 
in batch processing and performance tuning. 


ORE on the WEB 


Download the code at 
InstantDoc ID 100844. 
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LISTING I: The InsertEmployee Stored 
Procedure 


IF (object id('InsertEmployee')) is NOT NULL 
DROP PROCEDURE InsertEmployee 
GO 


(A)CREATE PROCEDURE InsertEmployee 
C 
GFirstName varchar(15), 
@LastName varchar(15), 


the stored procedure 
( ProcedureName) and 
optionally the schema 
to which it will belong 
( SchemaName). You 
can read more about 
schemas in “T-SQL 


@Salary int, 
@HireDate datetime 


) 
AS 


-- Section 1: Define and initialize the local 


-- variable. 
DECLARE @count int 
SELECT @count = Ø 


-- Section 2: Determine whether the record already 


-- exists. 


SELECT @count = COUNT(*) 


FROM Employee 


WHERE FirstName = @FirstName 
= @LastName 


AND LastName 


-- Section 3: Insert the record if it doesn’t already 


-- exist. 
IF (@count = 9) 
BEGIN 


INSERT INTO Employee VALUES 
(@FirstName, @LastName, @Salary, @HireDate) 
PRINT ‘Employee record inserted’ 


END 
ELSE 


PRINT ‘Employee record already exists...' 


GO 


“T-SQL 101, Lesson 1,” 
“T-SQL 101, Lesson 2,” 
“T-SQL 101, Lesson 3,” 
“T-SQL 101, Lesson 4,” 
“T-SQL 101, Lesson 5,” 
“T-SQL 101, Lesson 6,” 
“T-SQL 101, Lesson 7,” 
“T-SQL 101, Lesson 8,” 
“T-SQL 101, Lesson 9,” 
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LEARNING PATH 


SQL SERVER MAGAZINE RESOURCES 
To read the previous T-SQL 101 lessons, go to 


101, Lesson 6” (August 
2008, InstantDoc ID 
99448). 

The second por- 
tion is where you 
specify optional input 
and output parame- 
ters. These parameters 
are defined within the 
parentheses and are 
separated by commas. 
Each parameter is 
prefixed with the @ 
symbol and must have 
a data type speci- 
fied. Each parameter 
specified will accept an 
input value, but if you 
want to return a value 
back to the calling 
EXECUTE statement 
(i.e., the command that caused the stored procedure to 
run) you must specify the OUTPUT argument for that 
parameter. You can have as many input and output 
parameters as needed. 

The third portion of the CREATE PROCEDURE 
statement begins with the AS keyword. Often referred 
to as the main body of the stored procedure, this is 
where you specify the T-SQL statements that you want 
to execute. You can have as many T-SQL statements as 
needed. (For more information about the CREATE 
PROCEDURES syntax, go to msdn.microsoft.com/ 
en-us/library/ms187926(SQL.90).aspx.) 

Let's look at an example of a CREATE PRO- 
CEDURE command. 
Suppose that three new 
employees have joined 
the fictitious company in 
which employees review 
movies discussed in pre- 
vious lessons. Listing 1 
shows the code that cre- 
ates a stored procedure 
named InsertEmployee, 
which you can use to 
insert records into the 
Employee table. Because 
the Employee table has 
four columns (i.e, First- 


InstantDoc ID 97724 
InstantDoc ID 98105 
InstantDoc ID 98315 
InstantDoc ID 98711 
InstantDoc ID 99137 
InstantDoc ID 99448 
InstantDoc ID 99765 
InstantDoc ID 99832 
InstantDoc ID 100152 


Name, LastName, Salary, and HireDate), the proce- 
dure has four input parameters, as callout A shows. 
When you execute this stored procedure, you'll pass 
in the employee's first name, last name, current salary, 
and hire date using the @FirstName, @LastName, 
@Salary, and @HireDate parameters, respectively. 

The InsertEmployee stored procedure’s main body 
has been broken down into three sections. Each sec- 
tion has a comment that provides a brief description 
of what that section of code accomplishes. Comments 
begin with a double hyphen (--). They’re for documen- 
tation purposes and not execution. (For more infor- 
mation about comments, go to msdn.microsoft.com/ 
en-us/library/ms181627(SQL.90).aspx.) 

In the first section, a local variable is defined and 
initialized. Local variables hold a single data value 
of a specific data type. (For more information about 
local variables, go to msdn.microsoft.com/en-us/library/ 
ms187953(SQL.90).aspx.) Although local variables aren't 
required in stored procedures, InsertEmployee uses a 
local variable named @count in the second section. 

After @count is defined as having an integer value, 
it's initialized to 0. Although SQL Server doesn’t 
require you to initialize variables, it’s a good habit 
to get into, especially when you'll be joining, or con- 
catenating, those variables with strings. On several 
occasions I’ve helped developers troubleshoot their 
code because they couldn't determine why there was 
no output being generated. As it turns out, they were 
taking an uninitialized variable and appending it to a 
string. The variable contained a NULL value, which by 
default causes SQL Server to nullify the entire string. 

The second section contains the code that pre- 
vents duplicate records from being inserted into the 
Employee table. A SELECT statement counts the 
number of Employee table records whose FirstName 
and LastName fields match the @FirstName and 
@LastName input parameters. The @count variable 
stores that number. 

The third section of the stored procedure starts off 
with an IF...ELSE statement. If the value stored in 
the @count variable is 0, then the statements inside the 
BEGIN...END block will execute. The BEGIN and 
END keywords are only necessary if you have more 
than one statement to execute. In this case, there are 
two statements. The first statement inserts the record 
into the Employee table. The second statement prints 
a message stating that the record was inserted. 

If the number stored inside the @count variable is 
anything other than 0, then the statement following the 
ELSE keyword will execute. Because there’s only one 
statement following the ELSE keyword, you don’t need 
to use a BEGIN...END block. (You can read more 
about IF...ELSE statements at msdn.microsoft.com/ 
en-us/library/ms182717(SQL.90).aspx.) 

The last line of code in the InsertEmployee stored 
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procedure is the GO command. It’s a good idea to 
follow each CREATE PROCEDURE statement with 
the GO command. This command tells SQL Server 
that there are no more statements for the stored proce- 
dure and lets you create multiple stored procedures in 
one script. 

After you’ve written the stored procedure, you need 
to add it to the database. To do so, simply run the code 
in the query editor in SQL Server 2005’s SQL Server 
Management Studio (SSMS) or SQL Server 2000s 
Query Analyzer. 


How to Execute Stored 
Procedures 

To execute a stored procedure, you need to use 
the EXECUTE command followed by the name 
of the stored procedure and the necessary input 
parameters. In this case, each input parameter 
needs to be followed by an equal sign and the value 
being passed in. For example, to add a record to 
the Employee table with the InsertEmployee stored 
procedure, you'd use a command such as 


EXECUTE InsertEmployee 
@FirstName = 'Graham', 
@LastName = 'Oatley', 
(Salary = 99000, 
@HireDate = '2008-01-01' 


The results would look like 


(1 row(s) affected) 
Employee record inserted 


If you try running the same command again, you'll 
see the results of the record-duplication check: 


Employee record already exists 


Next, run the code in Listing 2 to add two more 
employee records to the Employee table. To con- 
firm that the stored procedure worked as expected 
and actually inserted all three employee records, 
execute the statement 


SELECT * 
FROM Employee 
WHERE HireDate >= '2008' 


Figure 1, page 40, shows the results. Note that there 
are five columns even though the stored procedure 
adds values to only four columns. As I explained in 
“T-SQL 101, Lesson 2” (April 2008, InstantDoc ID 
98105), the first column is an identity column. SQL 
Server automatically adds that column's values. 
The value is incremented with each record being 
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LISTING 2: Code that Adds More 


inserted. Because the Employee Employees to the Employee Table 


table had eight existing records, 
EXECUTE InsertEmployee 
the three new records have the A TIU NS 


EmployeeID values of 9, 10, @LastName = 'Brodie', 
and 11 @Salary = 1459000, 
: @HireDate = '2008-02-190' 
Now let's suppose that the 
A EXECUTE InsertEmployee 
three new employees wanted in 


@FirstName = 'Pierre', 


on reviewing the movies they @LastName = 'LaMontagne', 
watched, so they diligently sub- 
mitted their movie reviews. Some 
of those reviews were for movies 


@Salary = 145000, 
@HireDate = '2008-02-10' 


2» 
Sot limes t 2 
SQt Server fdlifórmance 
problem is GiByious. 


Usually, it's harder to pinpoint. 
Amazing what you can accomplish once 
you have the information you need. 


When the source of a database-driven application 
slowdown isn't immediately obvious, try a tool that 

can get you up to speed. One that pinpoints database 
bottlenecks and calculates application wait time at each 
step. Confio lets you unravel slowdowns at the database 
level with no installed agents. And solving problems 
where they exist costs a tenth of working around it by 
adding new server CPU's. Now that's a vision that can 
take you places. 


A smarter solution makes everyone 
look brilliant. e NI 


Download your FREE trial of Confio Ignite" at www.confio.com/sqimag 
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of a report. The report 
lists the movies in the 
genre you specify and 
provides each movie's 
average rating, which is 
calculated from the rat- 


HireDate 


2008-01-01 00:00:00.000 
2008-02-10 00:00:00.000 
2008-02-10 00:00:00.000 


EmployeeID FirstName 


LastName 


Graham 
Axel 
Pierre 


LaMontagne 145000 


Figure | 


Confirmation that the 
InsertEmployee stored 
procedure worked 


not in the Movie table, so records need to be added to 
both the Movie table and MovieReview tables. To insert 
data into these tables, you can use the InsertMovie and 
InsertMovieReview stored procedures. These stored 
procedures are very similar to the InsertEmployee 


LISTING 3: The ShowMovieReviews 
Stored Procedure 


IF (object id('ShowMovieReviews')) IS NOT NULL 
DROP PROCEDURE ShowMovieReviews 

GO 

CREATE PROCEDURE ShowMovieReviews 

C 
@Genre varchar(12) 

X 

AS 


SELECT m.Movie, g.Genre, 
LEFT(REPLICATE('* ',AVG(mr.Stars)),10) AS 'Stars', 
COUNT(*) AS 'Reviews' 

FROM MovieReview mr 

INNER JOIN Movie m ON m.MovieID - mr.MovieID 

INNER JOIN Genre g ON g.GenreID - m.GenreID 

INNER JOIN Employee e ON e.EmployeeID - mr.EmployeeID 

WHERE Genre = @Genre 

GROUP BY Movie, Genre 

ORDER BY 3 DESC 


stored procedure, 
so I won't discuss 
how they work. You 
can find the code 
for these stored 
procedures in the 
InsertMovie.sql and 
InsertMovieReview 
.sql files, which are 
in the 100844.zip 
file. 

After you execute 
the code that adds 
the InsertMovie and 
InsertMovieReview 
stored procedures 


ings in the applicable movie reviews. 

The ShowMovieReviews stored procedure takes 
one input parameter: @Genre. For example, to gen- 
erate a report that shows all the thriller movies and 
their average ratings, you'd execute the code 


EXECUTE ShowMovieReviews 
@Genre = 'Thriller' 


Figure 2 shows the results. Similarly, to generate a 
report that shows all the sci-fi movies and their average 
ratings, you'd run the command 


EXECUTE ShowMovieReviews 
@Genre = 'Sci-Fi' 


Figure 3 shows the results. 
The Journey’s End 


It has been a long journey to get to this final lesson. In 
this series, you learned how to create tables and views, 


GO to the database, how to insert, update, and delete records in tables, and 

run the code in the how to summarize and aggregate data. You learned 

about the difference between inner 

Movie Genre Stars Reviews and outer joins and the various 

cio ica aia - eii du ca vumisc nin m E ee E E Lg wd - mE system functions that you can use 
Full Metal Bracket Thriller eee o 3 : . : 

Rollback To The Future Thriller Wo odo 3 to manipulate data in queries. In 

Server Room with a View Thriller eke 3 this final lesson, you learned how 


Figure 2 


The ShowMovie- 
Reviews stored 
procedure’s results 
for the thriller genre 


CodeToAddMoreMovies.sq]l file to add the three new 
movies to the Movie table. Then execute the code in 
the CodeToAddNewMovieReviews.sq] file to insert the 
new employees’ movie reviews in the MovieReview table. 
CodeToAddNewMovies.sql and CodeToAddNew- 


to tuck all those useful T-SQL 
commands inside a wonderful container known as a 
stored procedure. 

I've had a lot of fun with this series (you might 
have noticed my corny jokes strewn about). Believe 
it or not, I actually learned a lot. I keep a copy of 


MovieReviews.sq] are in the 100844 zip file. “T-SQL 101, Lesson 5: How to Join Tables” (July 


2008, InstantDoc ID 99137) on my 


MeV Gens TT Reviews desk at work for those sleepy days 
MET PRIM eee Coe eee ae when I just can’t remember the dif- 
ariots of Firewire Sci-Fi WU X NS 
SCSI Terminator: Connection Day Sci-Fi ee RRR 3 ference between an INNER JOIN 
The Day the Hard Drive Stood Still Sci-Fi Foe W^ 3 and a RIGHT OUTER JOIN. 
V for vendor Sci-Fi Bok 7 : 
I truly hope you found this 


Figure 3 


The ShowMovie- 
Reviews stored 
procedure's results 
for the sci-fi genre 
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How to Use Stored Procedures 
to Produce Reports 

So far the InsertEmployee, InsertMovie, and Insert- 
MovieReview stored procedures have been used to get 
data into the database. The code in Listing 3 creates 
a stored procedure, ShowMovieReviews, that you 
can use to get data out of the database in the form 


series useful. My goal was to intro- 
duce novices to the basics of the T-SQL language 
while encouraging them to dig deeper on their own. 
If you've been reading along, been doing the exercises, 
and become much more comfortable with T-SQL 
than when you first started, then I have only one thing 
further to say: Congratulations, you just graduated! 

SOU} 
InstantDoc ID 100844 
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Toad for SQL Server 


[ "ve never been that impressed with SQL Server Man- 
agement Studio (SSMS). To me, it's always felt like 
a last-minute decision made by marketing folks who 
were more interested in forcing the "Studio" branding 
than developer and DBA productivity. Service packs 
and SQL Server 2008 have ameliorated my sentiments 
a bit, and SSMS is much better than the tools that 
ship with most other database platforms, but I've long 
wondered whether I'd jump ship if I could find another 
suitable product that met my needs. After working with 
Toad for SQL Server for a few days, I’m finding myself 
thinking about making the switch. 

There are three versions of Toad for SQL Server. 
The Professional Edition costs $595 and, in my mind, 
provides the best cost to value ratio. Toad for SQL 
Server Xpert, $1,045, adds T-SQL tuning wizards and 
capabilities, and Toad for SQL Server Development 
Suite, $1,295, adds benchmarking capabilities. For this 
article, I reviewed the Development Suite, which is 
available as a free trial. 

Installing Toad for SQL Server 4.1 on 64-bit Win- 
dows Vista was a breeze, and the quick-start documents 
provided by Quest Software gave me a great overview 
of Toad before I used it. A connection manager made it 
easy to manage server connections. Toad's UI is much 
more responsive, smooth, and fluid than SSMS; once 
you've established a link to a database, using Toad feels 
like driving a Ferrari while SSMS, in comparison, feels 
like driving an old farm truck. Toad's object explorer 
fairly closely matches the functionality provided by 
SSMS' object explorer, but double-clicking an object 
such as a stored procedure or table instantly loads the 
script for the selected object along with an extremely 
impressive bevy of context-sensitive metadata. (You 
can see this metadata and the rest of Toad for SQL 
Server’s GUI in the online version of this review. Go to 
www.sqimag.com and enter InstantDoc ID 101690.) 

Metadata displayed includes object sizes, con- 
straints, permissions, and dependencies. This metadata 
represents one of Toad’s greatest strengths, as it shows 
exactly the kind of things that DBAs and other serious 
database professionals would want to see when inter- 
acting with objects. 

Another of Toad’s strengths is the impressive 
amount of customization it offers. It comes with a 
number of skins and themes to match user prefer- 
ences, and many aspects of the product can be 
heavily customized, including keyboard shortcuts. 
In addition to providing a variety of advanced 
preferences and configuration options, Toad gives 
features beyond the standard ones you would expect 
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from using SSMS. For example, all versions of Toad 
include log reader functionality (which allows you 
to review SQL Server's log files and do undo and 
replay operations), schema and data comparison 
and synchronization tools, and custom reporting and 
automation engines or frameworks. These tools alone 
can almost justify the cost of the Professional Edition 
in some environments. 

Of course, Toad comes with the basics that you 
need to manage and develop databases. There's an ER 
diagramming tool, a visual query builder, support for 
backup and restore operations, and the ability to view 
and manage jobs. In short, Toad offers everything you 
need to replace SSMS, plus an assortment of other 
tools and utilities. 

I noticed a few times that using Toad is close 
enough to SSMS in many ways that it feels like you're 
using a souped-up version of SSMS. However, many of 
the keyboard shortcuts and options you may be used to 
using in SSMS aren't where you'd expect them. Because 
everything is customizable, there's really no reason you 
couldn't get things working exactly how you'd want 
them. Also, I’m unsure what to think about Toad’s 
lack of support for T-SQL templates, despite how 
cool Toad's snippets support is. But one thing is for 
certain: Toad for SQL Server is definitely worth a look, 
and depending upon your work habits and needs, you 
might find that after using it a bit, you'll be tempted to 
kick SSMS to the curb. SQL 
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TOAD FOR SQL SERVER 


Pros: Mature and highly customizable management and development 


Michael K. 
Campbell 


(mike @ sqlservervideos.com) is a consul- 
tant with years of SQL Server DBA and 


development experience. He spends most 


of his time engaged in consulting, technical 
evangelism, and creating free online SQL 


Server videos. 


UI, many improvements over SSMS; includes log file reader, schema/data 
synchronization, and other tools; very fluid performance. 


Cons: Versions other than Professional are a bit high-priced for what they pro- 
vide; similarity to SSMS will cause long-term SSMS users a bit of pain as they 
adjust to different work flows, paradigms, and keyboard shortcuts. 


Rating: KAAK K 


Price: $595 for Toad for SQL Server Professional, $1,045 for Toad for SQL 
Server XPert, and $1,295 for Toad Development Suite for SAL Server (adds 
benchmarking capabilities). Support and bulk-pricing options are also 


available. 


Recommendation: Toad for SQL Server offers a number of great tools and 
options for DBAs and developers that facilitate database interaction and allow 
for extensive customization of work flows and the UI. SQL Server developers 
and DBAs should definitely give Toad for SQL Server Professional a test drive. 


Contact: Quest Software e www.quest.com 
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Download Central brings you the tools to meet your most critical SQL needs. 


A one-stop hub of countless free trial downloads from leading 
industry vendors, Download Central has done all the looking. 


All you have to do is see which tool is the best fit. 
And you get to do it all for FREE! 


Score Your Solution at Download Central! 
sqlmag.com/go/downloads 


Bytes from the Blog 


www.sqlmag.com/go/industrybytes 


Microsoft on VS 2008, VS 2010 
Team System 

icrosoft remains a developer’s company at 

heart, and the core developer's platform out of 
Microsoft is the Visual Studio (VS) line of products. 
In an interview with Windows IT Pro, Dave Mendlen, 
director of the developer tools group, shared his 
thoughts about the current Visual Studio 2008 release 
and the upcoming VS 2010 Team System release. 

Dave believes that VS 2008 and the .NET Frame- 
work 3.5 combine to make one of the strongest releases 
of the Microsoft development line-up in a long time. 
One really useful feature is the ability to perform multi- 
targeting. VS 2008 can target project builds to multiple 
versions of the .NET Framework. Another big focus 
for the VS 2008 release is the ability to enable Office 
application development. For the first time, VS 2008 
includes Visual Studio Tools for Office (VSTO), which 
was previously a separate product. 

Thanks to the popularly of Web 2.0, the VS 2008 
release features a number of web related enhance- 
ments. VS 2008 now sup- 
ports ASP.NET AJAX, 
which enables developers 


rich web applications 
by utilizing JavaScript 
on the web clients. vs Mi crosoft. 
2008 supports JavaScript 

IntelliSense and debugging. Other important web 
development enhancements include VS 2008's new 
HTML split design window and Cascading Style 
Sheet (CSS) support. The new split designer window 
enables you to see both the graphical view and the 
code of the web pages being developed. CSS allows 
the web designer to apply a consistent set of styles to 
a website. VS 2008 includes a CSS Outline Window, 
a Style Application Toolbar, and Apply Styles and 
Manage Styles windows. 

Other important enhancements to VS 2008 are 
ClickOnce Deployment and Language Integrated 
Query (LINQ). End users typically prefer the user 
interface offered by Windows applications, but Win- 
dows applications are more difficult to deploy than 
web applications. ClickOnce allows Windows appli- 
cations to enjoy the same type of application distri- 
bution advantages found in web applications. With 
ClickOnce, the user runs the application for the first 
time by clicking on a link on a web page that deploys 
the application to the user's system. Each subsequent 
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Visual Studio Team System 
to create and debug is One of the fastest 
growing products at 


time that the application runs, it automatically checks 
the URL for any newer versions and updates the 
application. 

VS 2008s new LINQ technology essentially allows 
data access queries to be expressed directly using either 
the VB or C# languages. This eliminates the need to 
embed SQL queries within your .NET data access 
applications. LINQ can be used to access both rela- 
tional databases (such as SQL Server) and other data 
sources (such as XML). LINQ can boost productivity 
through IntelliSense, which can prompt for informa- 
tion such as database object names. VS 2008’s LINQ 
support also allows it to perform syntax checking at 
design time rather than having to wait to execute to the 
query before you know if there’s any errors. 


Visual Studio 2010 Team 
Systems 

Visual Studio Team System is one of the fastest 
growing products at Microsoft. Team System is 
designed to incorporate all personnel involved in 
the application develop- 
ment process including 
architects, DBAs, devel- 
opers, and testers. Driven 
by customer demand, the 
VS 2010 Team System 
will incorporate the func- 
tionality that is currently 
found in the Visual Studio Team System 2008 Data- 
base Edition. In addition, the new release will provide 
a graphical UML designer and integrated unit testing 
where code changes can be tested before they are 
checked into Visual Studio Team System’s source 
control. The VS 2010 Team System release will also 
include a new Test Runner feature that enables testers 
to take a snapshot of the test scenario and use it to 
replay the test scenario helping to eliminate hard to 
reproduce bugs. 

To check out the new changes to VS 2008, visit 
the virtual lab or download a 90-day trial version 
at www.microsoft.com/visualstudio/en-us/try/default 
mspx. You can download a 90-day free trial edition of 
VS 2008 Team Suite from www.tinyurl.com/vs2008ts. 
The trial version of Team Foundation Server can also 
be found at www.tinyurl.com/vstfs. The completely 
free versions of VS 2008 Express can be found at 
www.microsoft.com/express/product/default.aspx. 

SQL 
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Left-Brain.com is the newly launched online superstore stocked with 
educational, training, and career-development materials focused on 
meeting the needs of IT professionals like you. 


7> SOL Server System Views i Featured Product: 
| | SQL Server 2008 System Views Poster 


Face the migration learning curve head on with the SQL Server 
2008 System Views poster. An updated full-color print diagram of 
catalog views, dynamic management views, tables, and objects for 
SOL Server 2008 (including relationship types and object scope), 
this poster is a must-have for every SOL DBA migrating to or al- 
ready working with SOL Server 2008. 


Order your full-size, print copy 
today for only $9.95*! 


*Plus shipping and applicable tax. 


brai 
www.left-brain.com SOLE: 


BACKUP AND RECOVERY 

Asempra Makes Data Protection Simple for SMBs 
Asempra's Business Continuity Server (BCS) provides application-aware backup and recovery to critical systems Got a great 
such as Microsoft Exchange Server and SQL Server. In the event of a system failure, BCS presents a virtualized new product? 
set of data back to the application, giving end-users access to data immediately, even while the actual data is — Send announce- 
being recovered in the background. In addition to quick recovery, BCS provides extremely granular control over ments to products@ 
recovery point objective (RPO). For more information, call 408-215-3875 or visit www.asempra.com. 


sqlmag.com. 
—Jeff James, 
Editor-in-Chief 


DATABASE MANAGEMENT 
Simplify Your Data Migration and Integration Process ar 
Software Labs’ xFusion Studio 4.0 can help you streamline your data. Jees — 1B 3:*60»9: 


migration and integration tasks, simplify your business processes to LIII m -— (© SekesOrderintegrationskstn" | 

* " ee . qR Action: Createa 

improve business decision making, and automate manual database tasks. connection to your data Caas 
age ] ] E E : . UR Action: Create anew 

In addition, xFusion Studio 4.0 includes new data integration and valida- samdari 


JB Action: Aggregate multiple 


tion rules, as well as a new email notification feature that lets you know if : s E 
the automated data migration process was stopped because of an error. 

xFusion supports three coding languages—C#, Visual Basic, and JScript. 
Pricing for xFusion Studio 4.0 starts at $5,000 for a two-user license plus 
support and maintenance. To learn more, call 916-773-6272 or visit www 
.softlabsco.com. 


one or more queries 


JR Action: Create a new data 
exporttemolate 
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ERP Sales Order Integration F 
This xFusion Studio file enables customization of Sales Order processing. 


DATABASE MANAGEMENT 


Task 
Simpana 8.0 Offers Global Embedded Software ps 
B : 1. Downloading Sales Order Information into Excel format 
Deduplication 1. Edt connection ERP System and specfy your connection sengs. 
The latest version of CommVault's data management software, Simpana 8.0, CM. cesta s Lice Loc ei M 


is now available. Simpana uses the Backup, Archive, Replication, Resource 


Management, and Search modules, which are individually licensed, to manage data and perform 


tasks such as content indexing and encryption. Simpana 8.0 provides enhanced recovery manage- = 26 
ment, data reduction, virtual server data protection, and content organization capabilities. This DATAB ASES - 
release offers the industry's first Global Embedded Software Deduplication, which lets you dedu- Mem 

plicate data no matter where it's stored in your organization and store deduplicated data on tape or — "***"r*o ce, vr. 

disk. Simpana offers a single virtual client that supports both VMware’s VMware Infrastructure 
and Microsoft's Hyper-V. To learn more, call 888-746-3849 or visit www.commvault.com. 


TRAINING 

Fairy Teaches Database Design and Management 

Need to learn database fundamentals but hate reading textbooks? The Manga Guide to Databases 
uses a Japanese comic book format and a “database fairy” named Tico to teach you how to design 
and manage databases. This book uses real-life examples to walk you through database basics, 
including using set and relational operations to extract data from relational databases, applying 
the entity-relationship model to accurately represent data, setting user permissions, using transac- 
tions and locks, and using SQL to update and retrieve data and create reports. The Manga Guide 
to Databases also discusses more in-depth topics, such as security, replication, disaster recovery, 
and indexing. To learn more, call 800-420-7240 or visit www.nostarch.com. [SQL | 
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ree utilities can help you stretch that dollar. The 

SQL Server 2008 Feature Pack (www.microsoft 
.com/downloads/details.aspx?FamilyID=228de03f- 
3b5a-428a-923f-58a033d316el) contains download- 
able programs, drivers, and add-ons, including the 
following great eight: 
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SQL Server 2005 Driver for PHP 
This PHP 5 extension enables PHP applications to access 
SQL Server 2008 and SQL Server 2005. It requires the 
SQL Server Native Client for SQL Server connectivity. 


SQL Server 2005 JDBC Driver 1.2 
This Type 4 JDBC driver provides access to SQL 
Server 2008, SQL Server 2005, and SQL Server 2000 
from Java applications and applets. 


OLEDB Provider for DB2 

This COM component connects SQL Server 2008 to 
IBM DB2 databases. Use it with Integration Services, 
Analysis Services, Replication, Reporting Services, and 
the Distributed Query Processor. 


SQL Server 2008 Policies 
These policies offer an example of how to use SQL 
Server 2008’s Policy-Based Management (PBM). 


SQL Server 2005 Backward 
Compatibility Components 
Included in the components are SP4 updates for the 
SQL Server 2000 DTS runtime. Also included are 
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Fam SOL Server 2008 Feature Pack 


SQL Server 2005 SP2 versions of SQL Distributed 
Management Objects (SQL-DMO), the Decision Sup- 
port Objects (DSO), and SQL Virtual Device Interface 
(SQLVDI). 


Data Mining Viewer Controls 
This is a set of Windows Forms controls that enables 
developers to display SQL Server 2008 data-mining 
models in their Windows client applications. The Data 
Mining Web Controls display the data patterns that are 
generated using Analysis Services’ mining models. 


Data Mining Add-ins for 
Microsoft Office 2007 

These add-ins let you do more with data mining. The 
Table Analysis Tools for Excel enable you to use SQL 
Server 2008 data-mining features from Excel; the Data 
Mining Client for Excel lets you go through the data- 
mining model-development lifecycle using Excel 2007 
spreadsheet data, and the Data Mining Templates for 
Visio display your data-mining models in Visio 2007. 


SQL Server 2008 Report 
Builder 2.0 
This new, end-user report-authoring tool was origi- 
nally scheduled to be delivered with SQL Server 2008 
but didn’t make the final release. Now available in the 
Feature Pack, it has a new Office-style look and feel. 
The Report Builder 2.0 component requires the .NET 
Framework 3.5. Sou 
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Humphries 
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Windows IT Pro associate and regular 
contributor to Windows IT Pro, SQL 
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She specializes in the Windows IT Pro 
network of tools and resources. 
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The New Standard in Monitoring and Performance 
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* Wait Resources are automatically resolved for you, so for the first time you can instantly 39 12410) 
see exactly which tables or indexes are being contended for without any manual effort. roe (0) 


Generate alerts with full block chain details whenever a block occurs, when a block 
exceeds a specific duration, or when a blocking query contain specific text. 


Graphical display of all deadlock types, synchronized with an innovative grid-based 
display to provide the most complete picture of deadlocks available. 


Deadlock nodes contain the concise information you need to quickly visualize the 
deadlock and determine the root cause. 


Generate alerts including the deadlock victim SQL whenever a deadlock occurs, or 
when the victim query contains specific text. SOL Server Deadlocks 


Performance Dashboard with relevant SOL Server + Top SOL analysis highlights heaviest queries 


ele B I NS Ere * Graphical blocking and deadlock analysis 


Real-time and historical performance analysis : 


Calendar views of Top SQL, blocks and deadlocks 


Disk activity, latency, and capacity monitoring * One-click and automated tracing with Quick Trace™ 


Free Trial Download: sglsentry.net/sql-performance SENTRY 
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iCjlere 
Being on call doesn't have to mean putting your life on hold now that Idera's SOL diagnostic manager 


includes the award-winning SOL mobile manager. SOL mobile manager allows DBAs to tackle even the 
toughest SOL Server problems anytime, from anywhere. See real-time performance metrics, take action 


to quickly fix issues and monitor SOL Server jobs all from your Blackberry or Windows Mobile device. 


With over 100,000 SQL Servers monitored worldwide, Idera leads the market in SQL Server performance and diagnostics solutions. 


Our products are easy to use and low-impact. What's more, you'll be up and running in minutes! Www.idera.com 


